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PROTEIN-PROTEIN INTERACTIONS 
Between Shigella flexneri polypeptides And Mammalian Polypeptides 

PRIORITY 

[0001] This application claims priority on the basis of United States Provisional 
Application No. 60/261,130, filed January 12, 2001, the contents of which are hereby 
incorporated by reference. 
BACKGROUND OF THE INVENTION 

[0002] Most biological processes involve specific protein-protein interactions. Protein- 
protein interactions enable two or more proteins to associate. A large number of non- 
covalent bonds form between the proteins when two protein surfaces are precisely matched. 
These bonds account for the specificity of recognition. Thus, protein-protein interactions are 
involved, for example, in the assembly of enzyme subunits, in antibody-antigen recognition, 
in the formation of biochemical complexes, in the correct folding of proteins, in the 
metabolism of proteins, in the transport of proteins, in the localization of proteins, in protein 
turnover, in first translation modifications, in the core structures of viruses and in signal 
transduction. 

[00 03] General methodologies to identify interacting proteins or to study these 
interactions have been developed. Among these methods are the two-hybrid system 
originally developed by Fields and co-workers and described, for example, in U.S. Patent 
Nos. 5,283,173, 5,468,614 and 5,667,973, which are hereby incorporated by reference. 

[0004] The earliest and simplest two-hybrid system, which acted as basis for 
development of other versions, is an in vivo assay between two specifically constructed 
proteins. The first protein, known in the art as the "bait protein" is a chimeric protein which 
binds to a site on DNA upstream of a reporter gene by means of a DNA-binding domain or 
BD. Commonly, the binding domain is the DNA-binding domain from either Gal4 or native E. 
coli LexA and the sites placed upstream of the reporter are Gal4 binding sites or LexA 
operators, respectively. 

[0005] The second protein is also a chimeric protein known as the "prey" in the art. This 
second chimeric protein carries an activation domain or AD. This activation domain is 
typically derived from Gal4, from VP16 or from B42. 

[0006] Besides the two hybrid systems, other improved systems have been developed 
to detected protein-protein interactions. For example, a two-hybrid plus one system was 
developed that allows the use of two proteins as bait to screen available cDNA libraries to 
detect a third partner. This method permits the detection between proteins that are part of a 
larger protein complex such as the RNA polymerase II holoenzyme and the TFIIH or TFIID 
complexes. Therefore, this method, in general, permits the detection of ternary complex 
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formation as well as inhibitors preventing the interaction between the two previously defined 
fused proteins. 

[0007] Another advantage of the two-hybrid plus one system is that it allows or prevents 
the formation of the transcriptional activator since the third partner can be expressed from a 
conditional promoter such as the methionine-repressed Met25 promoter which is positively 
regulated in medium lacking methionine. The presence of the methionine-regulated 
promoter provides an excellent control to evaluate the activation or inhibition properties of 
the third partner due to its "on" and "off" switch for the formation of the transcriptional 
activator. The three-hybrid method is described, for example in Tirode et al., The Journal of 
Biological Chemistry, 272, No. 37 pp. 22995-22999 (1997). incorporated herein by 
\a reference. 

y [00 08] Besides the two and two-hybrid plus one systems, yet another variant is that 

£ described in Vidal et al, Proc. Natl. Sci. 93 pgs. 10315-10320 called the reverse two- and 

*z one-hybrid systems where a collection of molecules can be screened that inhibit a specific 

03 protein-protein or protein/DNA interactions, respectively. 

[000 9] A summary of the available methodologies for detecting protein-protein 
O interactions is described in Vidal and Legrain, Nucleic Acids Research Vol. 27, No. 4 

pgs.919-929 (1999) and Legrain and Selig, FEBS Letters 480 pgs. 32-36 (2000) which 
M= references are incorporated herein by reference. 

T 3 [0010] However, the above conventionally used approaches and especially the 

commonly used two-hybrid methods have their drawbacks. For example, it is known in the 
art that, more often than not, false positives and false negatives exist in the screening 
method. In fact, a doctrine has been developed in this field for interpreting the results and in 
common practice an additional technique such as co-immunoprecipitation or gradient 
sedimentation of the putative interactors from the appropriate cell or tissue type are 
generally performed. The methods used for interpreting the results are described by Brent 
and Finley, Jr. in Ann. Rev. Genet, 31 pgs. 663-704 (1997). Thus, the data interpretation is 
very questionable using the conventional systems. 

[0011] One method to overcome the difficulties encountered with the methods in the 
prior art is described in WO 99/42612, incorporated herein by reference. This method is 
similar to the two-hybrid system described in the prior art in that it also uses bait and prey 
polypeptides. However, the difference with this method is that a step of mating at least one 
first haploid recombinant yeast cell containing the prey polypeptide to be assayed with a 
second haploid recombinant yeast cell containing the bait polynucleotide is performed. Of 
course the person skilled in the art would appreciate that either the first recombinant yeast 
cell or the second recombinant yeast cell also contains at least one detectable reporter gene 
that is activated by a polypeptide including a transcriptional activation domain. 
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[0 012] The method described in WO 99/42612 permits the screening of more prey 
polynucleotides with a given bait polynucleotide in a single step than in the prior art systems 
due to the cell to cell mating strategy between haploid yeast cells. Furthermore, this method 
is more thorough and reproducible, as well as sensitive. Thus, the presence of false 
negatives and/or false positives is extremely minimal as compared to the conventional prior 
art methods. 

[0013] The genus Shigella includes four species (major serogroups): S. dysenteriae 
(Grp. A), S. flexneri (Grp. B), S. boydii (Grp. C) and S. sonnei (Grp. D) as classified in 
Bergey's Manual for Systematic Bacteriology (N. R. Krieg, ed., pp. 423-427 (1984)). The 
genera Shigella and Escherichia are phylogenetically closely related. Brenner and others 
M= have suggested that the two are more correctly considered sibling species based on 

% DNA/DNA reassociation studies (D. J. Brenner et al., International J. Systematic 

=p Bacteriology, 23:1-7 (1973)). These studies showed that Shigella species are on average 

% 80-89% related to E. col 7 at the DNA level. Also, the degree of relatedness between Shigella 

IB species is on average 80-89%. 

[0014] The genus Shigella is pathogenic in humans; it causes bacillary dysentery at 
O levels of infection of 1 0 to 1 00 organisms. 

£7 [0015] Shigellosis or bacillary dysentery is a disease that is endemic throughout the 

N= world. The disease presents a particularly serious public health problem in tropical regions 

n 

and developing countries where Shigella dysenteriae and S. flexneri predominate. In 
industrialized countries, the principal etiologic agent is S. sonnei although sporadic cases of 
shigellosis are encountered due to S. flexneri, S. boydii and certain entero-invasive 
Escherichia coli. 

[0016] The primary step in the pathogenesis of bacillary dysentery is invasion of the 
human colonic mucosa by Shigella (Labrec, E. H., H. Schneider, T. J. Magnani, and S. B. 
Formal. 1964. Epithelial cell penetration as an essential step in the pathogenesis of bacillary 
dysentery. J. Bacteriol. 88:1503). Mucosal invasion encompasses several steps which 
include penetration of the bacteria into epithelial cells, intracellular multiplication, killing of 
host cells, and final spreading to adjacent cells and to connective tissue (Formal, S. B., T. L. 
Hale, and P. J. Sansonetti. 1983. Invasive enteric pathogens. Rev. Infect. Dis. 5:S702, Rout, 
W. Ft., S. B. Formal, R. A. Giannella, and G. J. Dammin. 1975. The pathophysiology of 
Shigella diarrhea in the Rhesus monkey; intestinal transport, morphology and bacteriological 
studies. Gastroenterology 68:270, Takeuchi, A., H. Spring, E. H. LaBrec, and S. B. Formal. 
1965. Experimental acute colitis in the Rhesus monkey following peroral infection with 
Shigella flexneri. Am. J. Pathol. 52:503, Takeuchi, A. 1967. Electron microscope studies of 
experimental Salmonella infection. I. Penetration into cells of the intestinal epithelium by 
Salmonella typhimurium. Am. J. Pathol. 47:1011). The overall process which is usually 
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limited to the mucosal surface leads to a strong inflammatory reaction which is responsible 
for abscesses and ulcerations (Labrec, E. H., H. Schneider, T. J. Magnani, and S. B. Formal. 
1964. Epithelial cell penetration as an essential step in the pathogenesis of bacillary 
dysentery. J. Bacteriol. 88:1503., Rout, W. R., S. B. Formal, R. A. Giannella, and G. J. 
Dammin. 1975. The pathophysiology of Shigella diarrhea in the Rhesus monkey; intestinal 
transport, morphology and bacteriological studies. Gastroenterology 68:270, Takeuchi, A., H. 
Spring, E. H. LaBrec, and S. B. Formal. 1965. Experimental acute colitis in the Rhesus 
monkey following peroral infection with Shigella flexneri. Am. J. Pathol. 52:503). 
[0017] Even though dysentery is characteristic of shigellosis, it may be preceded by 
watery diarrhea. Diarrhea appears to be the result of disturbances in colonic reabsorption 
and increased jejunal secretion whereas dysentery is a purely colonic process (Kinsey, M. 
D., S. B. Formal, G. J. Dammin, and R. A. Giannella. 1976. Fluid and electrolyte transport in 
Rhesus monkeys challenged intraceacally with Shigella flexneri 2a. Infect. Immun. 14:368). 
These include toxic megacolon, leukemoid reactions and hemolytic-uremic syndrome 
("HUS"). The latter is a major cause of mortality from shigellosis in developing areas 
(Gianantonio, C, H. Vitacco, F. Mendilaharzu, A. Rutty, and J. Mendilaharzu. 1964. The 
hemolytic-uremic syndrome. J. Pediatr. 64:478, Koster, F., J. Levin, L. Walker, K. S. K. 
Tung, R. H. Gilman, M. M. Rajaman, M. A. Majid, S. Islam, and R. C. Williams Jr. 1977. 
Hemolyticuremic syndrome after shigellosis. Relation to endotoxin and circulating immune 
complexes. N. Engl. J. Med. 298:927). 

[0018] The role of Shiga-toxin produced at high level by S. dysenteriae 1 (Conradi, H., 
1903. Ueber loshlishe, durch aseptische Autolyse, erhaltene Giftstoffe von Ruhr-un Typhus 
bazillen. Dtsch. Med. Wochenschr. 29:26) and Shiga-like toxins ("SLT") produced at low 
level by S. flexneri and S. sonnei (Keusch, G. T., and M. Jacewicz. 1977. The pathogenesis 
of Shigella diarrhea. VI. Toxin and antitoxin in Shigella flexneri and Shigella sonnei infections 
in humans. J. Infect. Dis. 135:552) in the four major stages of shigellosis (i.e., invasion of 
individual epithelial cells, tissue invasion, diarrhea and systemic symptoms) is not well 
understood. For review see O'Brien and Holmes (O'Brien, A. D., and R. K. Holmes. 1987. 
Shiga and Shiga-like toxins. Microbiol. Rev. 51:206). Plasmids of 180-220 kilobases ("kb") 
are essential in all Shigella species for invasion of individual epithelial cells (Rout, W. R., S. 
B. Formal, R. A. Giannella, and G. J. Dammin. 1975. The pathophysiology of Shigella 
diarrhea in the Rhesus monkey; intestinal transport, morphology and bacteriological studies. 
Gastroenterology 68:270, Sansonetti, P. J., D. J. Kopecko, and S. B. Formal. 1981. Shigella 
sonnei plasmids: evidence that a large plasmid is neceessary for virulence. Infect. Immun. 
34:75, Sansonetti, P. J., T. L. Hale, G. I. Dammin, C. Kapper, H. H. Collins Jr., and S. B. 
Formal. 1983. Alterations in the pathogenesis of Escherichia coli K12 after transfer of 
plasmids and chromosomal genes from Shigella flexneri. Infect. Immun. 39:1392). This 



includes entry, intracellular multiplication and early killing of host cells (Clerc, P., A. Ryter, J. 
Mounier, and P. J. Sansonetti. 1987. Plasmid-mediated early killing of eucaryotic cells by 
Shigella flexneri as studied by infection of J774 macrophages. Infect. Immun. 55:521, Clerc, 
P., and P. J. Sansonetti. 1987. Entry of Shigella flexneri into HeLa cells: Evidence for 
directed phagocytosis involving actin polymerization and myosin accumulation. Infect. 
Immun. 55:2681). The role of Shiga-toxin and SLT at this stage is unclear. 
[0019] Recent evidence indicates that Shiga-toxin is cytotoxic for primary cultures of 
human colonic cells (Moyer, M. P., P. S. Dixon, S. W. Rothman, and J. E. Brown. 1987. 
Cytotoxicity of Shiga toxin for human colonic and ileal epithelial cells. Infect. Immun. 
55:1533). Tissue invasion requires additional chromosomally encoded products among 
which are smooth lipopolysaccharides ("LPS") (Sansonetti, P. J., T. L. Hale, G. I. Dammin, 
C. Kapper, H. H. Collins Jr., and S. B. Formal. 1983. Alterations in the pathogenesis of 
Escherichia coli K12 after transfer of plasmids and chromosomal genes from Shigella 
flexneri. Infect. Immun. 39:1392), the non-characterized product of the Kcp locus, and 
aerobactin. A region of the S. flexneri chromosome necessary for fluid production in rabbit 
ileal loops has been localized to the rha-mt1 regions and near the lysine decarboxylase 
locus (Sansonetti, P. J., T. L. Hale, G. I. Dammin, C. Kapper, H. H. Collins Jr., and S. B. 
Formal. 1983. Alterations in the pathogenesis of Escherichia coli K12 after transfer of 
plasmids and chromosomal genes from Shigella flexneri. Infect. Immun. 39:1392). However, 
no evidence has been adduced to show that the ability to cause fluid accumulation is due to 
the SLT of S. flexneri. Thus, the role of Shiga-toxin in causing the systemic complications of 
shigellosis is still hypothetical. However, Shiga-toxin can mediate vascular damage since 
capillary lesions observed in HUS resemble those observed in cerebral vessels of animals 
injected with this toxin (Bridgewater, F. A. I., R. S. Morgan, K. E. K. Rowson, and G. P. 
Wright. 1955. the neurotoxin of Shigella shigae. Morphological and functional lesions 
produced in the central nervous system of rabbits. Br. J. Exp. Pathol. 36: 447, Cavanagh, J. 
B., J. G. Howard, and J. L. Whitby. 1956. The neurotoxin of Shigella shigae. A comparative 
study of the effects produced in various laboratory animals. Br. J. Exp. Med. 37:272). 
[002 0] As described before, the genera of Shigella and Escherichia are phylogenetically 
closely related. Furthermore, the pathogenesis of enteroinvasive E. coli is very similar to 
that of Shigella. In both, dysentery results from invasion of the colonic epithelial cells 
followed by intracellular multiplication which leads to bloody, mucous discharge with scanty 
diarrhea. 

[0021] Pathogenic E. coli serotypes are collectively referred to as Enterovirulent E. coli 
(EVEC) (J. R. Lupski, et al., J. Infectious Diseases, 157:1120-1123 (1988); M. M. Levine, J. 
Infectious Diseases, 155:377-389 (1987); M. A. Karmali, Clinical Microbiology Reviews, 
2:15-38 (1989)). This group includes at least 5 subclasses of E. coli, each having a 



characteristic pathogenesis pathway resulting in diarrheal disease. The subclasses include 
Enterotoxigenic E. coli (ETEC), Verotoxin-Producing E. coli (VTEC), Enteropathogenic E. 
coli (EPEC), Enteroadherent E. coli (EAEC) and Enteroinvasive E. coli (EIEC). The VTEC 
include Enterohemorrhagic E. co// (EH EC) since these produce verotoxins. 
[0022] Thus, detection of Shigella and EIEC is important in various medical contexts. 
For example, the presence of either Shigella or EIEC in stool samples is indicative of 
gastroenteritis, and the ability to screen for their presence is useful in treating and controlling 
that disease. Detection of Shigella or EIEC in any possible transmission vehicle such as food 
is also important to avoid spread of gastroenteritis. 

[0023] That is why there is a great need to construct Protein Interaction Map between 
Shigella polypeptides and human polypeptides in order to understand mechanisms of 
Shigella pathogenesis and to identify drug target to treat Shigella associated diseases and 
Shigella detection means. 
W SUMMARY OF THE PRESENT INVENTION 

[0024] Thus, it is an object of the present invention to identify protein-protein interactions 
% J between Shigella polypeptides and mammalian, preferably human, polypeptides. 

G [0025] It is another object of the present invention to identify protein-protein interactions 

'f; between Shigella polypeptides and mammalian, preferably human, polypeptides for the 

l i= development of more effective and better targeted therapeutic applications. 

[0026] It is yet another object of the present invention to identify complexes of 
polypeptides or polynucleotides encoding the polypeptides and fragments of the 
polypeptides of Shigella genus and polypeptides and fragments of the polypeptides of 
mammals, preferably human. 

[0 027] It is yet another object of the present invention to identify antibodies to these 
complexes of polypeptides or polynucleotides encoding the polypeptides and fragments of 
the polypeptides of Shigella genus and mammals, preferably human, including polyclonal, as 
well as monoclonal antibodies that are used for detection. 

[0028] It is still another object of the present invention to identify selected interacting 
domains of the polypeptides, called SID® polypeptides. 

[002 9] It is still another object of the present invention to identify selected interacting 
domains of the polynucleotides, called SID® polynucleotides. 

[0030] It is another object of the present invention to generate protein-protein 
interactions maps called PIM®s. 

[0031] It is yet another object of the present invention to provide a method for screening 
drugs for agents which modulate the interaction of proteins and pharmaceutical 
compositions that are capable of modulating the protein-protein interactions between 
Shigella polypeptides and mammalian, preferably human, polypeptides. 
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[0032] It is another object to administer the nucleic acids of the present invention via 
gene therapy. 

[0033] It is yet another object of the present invention to provide protein chips or protein 
microarrays. 

[0034] It is yet another object of he present invention to provide a report in, for example 
paper, electronic and/or digital forms, concerning the protein-protein interactions, the 
modulating compounds and the like as well as a PIM®. 

[0035] Thus the present invention, in one aspect thereof, relates to a protein complex 
between a Shigella polypeptide and a mammalian polypeptide. In another embodiment, the 
Shigella and the mammalian polypeptides are polypeptides set forth on columns 1 and 3 
respectively of Table II. 

[0036] Furthermore, the present invention provides SID® polynucleotides and SID® 
polypeptides of Table III, as well as a PIM® between Shigella polypeptides and mammalian, 
preferably human, polypeptides. 

[0037] The present invention also provides antibodies to the protein-protein complexes 
between Shigella polypeptides and mammal, preferably human, polypeptides. 
[0 03 8] In another embodiment the present invention provides a method for screening 
drugs for agents that modulate the protein-protein interactions and pharmaceutical 
compositions that are capable of modulating protein-protein interactions. 
[0039] In another embodiment the present invention provides protein chips or protein 
microarrays. 

[0 04 0] In yet another embodiment the present invention provides a report in, for 
example, paper, electronic and/or digital forms. 
BRIEF DESCRIPTION OF THE DRAWINGS 

[0041] Fig. 1 is a schematic representation of the pB1 plasmid. 

[0 042] Fig. 2 is a schematic representation of the pB5 plasmid. 

[0043] Fig. 3 is a schematic representation of the pB6 plasmid. 

[0044] Fig. 4 is a schematic representation of the pB13 plasmid. 

[0045] Fig. 5 is a schematic representation of the pB14 plasmid. 

[0046] Fig. 6 is a schematic representation of the pB20 plasmid. 

[0047 ] Fig. 7 is a schematic representation of the pP1 plasmid. 

[0048] Fig. 8 is a schematic representation of the pP2 plasmid. 

[0049] Fig. 9 is a schematic representation of the pP3 plasmid. 

[0050] Fig. 10 is a schematic representation of the pP6 plasmid. 

[0051] Fig. 1 1 is a schematic representation of the pP7 plasmid. 

[0052] Fig. 12 is a schematic representation of vectors expressing the T25 fragment. 

[0053] Fig. 13 is a schematic representation of vectors expressing the T18 fragment. 
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[0054] Fig. 14 is a schematic representation of various vectors of pCmAHU, pT25 and 
pT18. 

[0055] Fig. 15 is a schematic representation of identification of SID®. In this figure the 
"Full-length prey protein" is the Open Reading Frame (ORF) or coding sequence (CDS) 
where the identified prey polypeptides are included. The Selected Interaction Domain 
(SID®) is determined by the commonly shared polypeptide domain of every selected prey 
fragment. 

[0056] Fig. 16 is a protein map (PIM®). 
DETAILED DESCRIPTION OF THE INVENTION 

[0057] As used herein the terms "polynucleotides", "nucleic acids" and "oligonucleotides" 
are used interchangeably and include, but are not limited to RNA, DNA, RNA/DNA 
sequences of more than one nucleotide in either single chain or duplex form. The 
polynucleotide sequences of the present invention may be prepared from any known method 
including, but not limited to, any synthetic method, any recombinant method, any ex vivo 
generation method and the like, as well as combinations thereof. 

[0058] The term "polypeptide" means herein a polymer of amino acids having no specific 
length. Thus, peptides, oligopeptides and proteins are included in the definition of 
"polypeptide" and these terms are used interchangeably throughout the specification, as well 
as in the claims. The term "polypeptide" does not exclude post-translational modifications 
such as polypeptides having covalent attachment of glycosyl groups, aceteyl groups, 
phosphate groups, lipid groups and the like. Also encompassed by this definition of 
"polypeptide" are homologs thereof. 

[0 05 9] By the term "homologs" is meant structurally similar genes contained within a 
given species, orthologs are functionally equivalent genes from a given species or strain, as 
determined for example, in a standard complementation assay. Thus, a polypeptide of 
interest can be used not only as a model for identifying similiar genes in given strains, but 
also to identify homologs and orthologs of the polypeptide of interest in other species. The 
orthologs, for example, can also be identified in a conventional complementation assay. In 
addition or alternatively, such orthologs can be expected to exist in bacteria (or other kind of 
cells) in the same branch of the phylogenic tree, as set forth, for example, at 

ftp://ftp.cme.msu.edu/pub/rdp/SSU-rRNA/SSU/Prok.phvlo . 

[0060] As used herein the term "prey polynucleotide" means a chimeric polynucleotide 
encoding a polypeptide comprising (i) a specific domain; and (ii) a polypeptide that is to be 
tested for interaction with a bait polypeptide. The specific domain is preferably a 
transcriptional activating domain. 

[0 0 61] As used herein, a "bait polynucleotide" is a chimeric polynucleotide encoding a 
chimeric polypeptide comprising (i) a complementary domain; and (ii) a polypeptide that is to 



be tested for interaction with at least one prey polypeptide. The complementary domain is 
preferably a DNA-binding domain that recognizes a binding site that is further detected and 
is contained in the host organism. 

[0062] As used herein "complementary domain" is meant a functional constitution of the 
activity when bait and prey are interacting; for example, enzymatic activity. 

[0063] As used herein "specific domain" is meant a functional interacting activation 
domain that may work through different mechanisms by interacting directly or indirectly 
through intermediary proteins with RNA polymerase II or Ill-associated proteins in the vicinity 
of the transcription start site. 

[0064] As used herein the term "complementary" means that, for example, each base of 
y= a first polynucleotide is paired with the complementary base of a second polynucleotide 

whose orientation is reversed. The complementary bases are A and T (or A and U) or C and 

i g. 

jf [0065] The term "sequence identity" refers to the identity between two peptides or 

Co between two nucleic acids. Identity between sequences can be determined by comparing a 

position in each of the sequences which may be aligned for the purposes of comparison. 
O When a position in the compared sequences is occupied by the same base or amino acid, 

then the sequences are identical at that position. A degree of sequence identity between 
M nucleic acid sequences is a function of the number of identical nucleotides at positions 

SrS shared by these sequences. A degree of identity between amino acid sequences is a 

function of the number of identical amino acid sequences that are shared between these 
sequences. Since two polypeptides may each (i) comprise a sequence (i.e., a portion of a 
complete polynucleotide sequence) that is similar between two polynucleotides, and (ii) may 
further comprise a sequence that is divergent between two polynucleotides, sequence 
identity comparisons between two or more polynucleotides over a "comparison window" 
refers to the conceptual segment of at least 20 contiguous nucleotide positions wherein a 
polynucleotide sequence may be compared to a reference nucleotide sequence of at least 
20 contiguous nucleotides and wherein the portion of the polynucleotide sequence in the 
comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less 
compared to the reference sequence (which does not comprise additions or deletions) for 
optimal alignment of the two sequences. 

[0066] To determine the percent identity of two amino acids sequences or two nucleic 
acid sequences, the sequences are aligned for optimal comparison. For example, gaps can 
be introduced in the sequence of a first amino acid sequence or a first nucleic acid sequence 
for optimal alignment with the second amino acid sequence or second nucleic acid 
sequence. The amino acid residues or nucleotides at corresponding amino acid positions or 
nucleotide positions are then compared. When a position in the first sequence is occupied 
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by the same amino acid residue or nucleotide as the corresponding position in the second 
sequence, the molecules are identical at that position. 

[0 067] The percent identity between the two sequences is a function of the number of 
identical positions shared by the sequences. Hence % identity = number of identical 
positions / total number of overlapping positions X 100. 

[0068] In this comparison the sequences can be the same length or may be different in 
length. Optimal alignment of sequences for determining a comparison window may be 
conducted by the local homology algorithm of Smith and Waterman (J. Theor. Biol., 91 (2) 
pgs. 370-380 (1981), by the homology alignment algorithm of Needleman and Wunsch, J. 
Miol. Biol., 48(3) pgs. 443-453 (1972), by the search for similarity via the method of Pearson 
and Lipman, PNAS, USA, 85(5) pgs. 2444-2448 (1988) , by computerized implementations 
of these algorithms (GAP, BESTFIT, FASTA and T FAST A in the Wisconsin Genetics 
Software Package Release 7.0, Genetic Computer Group, 575, Science Drive, Madison, 
Wisconsin) or by inspection. 

[0069] The best alignment (i.e., resulting in the highest percentage of identity over the 
comparison window) generated by the various methods is selected. 

[0070] The term "sequence identity" means that two polynucleotide sequences are 
identical (i.e., on a nucleotide by nucleotide basis) over the window of comparison. The term 
"percentage of sequence identity" is calculated by comparing two optimally aligned 
sequences over the window of comparison, determining the number of positions at which the 
identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both sequences to yield the 
number of matched positions, dividing the number of matched positions by the total number 
of positions in the window of comparison (i.e., the window size) and multiplying the result by 
100 to yield the percentage of sequence identity. The same process can be applied to 
polypeptide sequences. 

[0071] The percentage of sequence identity of a nucleic acid sequence or an amino acid 
sequence can also be calculated using BLAST software (Version 2.06 of September 1998) 
with the default or user defined parameter. 

[0072] The term "sequence similarity" means that amino acids can be modified while 
retaining the same function. It is known that amino acids are classified according to the 
nature of their side groups and some amino acids such as the basic amino acids can be 
interchanged for one another while their basic function is maintained. 

[0073] The term "isolated" as used herein means that a biological material such as a 
nucleic acid or protein has been removed from its original environment in which it is naturally 
present. For example, a polynucleotide present in a plant, mammal or animal is present in 
its natural state and is not considered to be isolated. The same polynucleotide separated 
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from the adjacent nucleic acid sequences in which it is naturally inserted in the genome of 
the plant or animal is considered as being "isolated." 

[0 074] The term "isolated" is not meant to exclude artificial or synthetic mixtures with 
other compounds, or the presence of impurities which do not interfere with the biological 
activity and which may be present, for example, due to incomplete purification, addition of 
stabilizers or mixtures with pharmaceutically acceptable excipients and the like. 

[007 5] "Isolated polypeptide" or "isolated protein" as used herein means a polypeptide or 
protein which is substantially free of those compounds that are normally associated with the 
polypeptide or protein in a naturally state such as other proteins or polypeptides, nucleic 
acids, carbohydrates, lipids and the like. 

[007 6] The term "purified" as used herein means at least one order of magnitude of 
purification is achieved, preferably two or three orders of magnitude, most preferably four or 
five orders of magnitude of purification of the starting material or of the natural material. 
Thus, the term "purified" as utilized herein does not mean that the material is 100% purified 
and thus excludes any other material. 

[0 077] The term "variants" when referring to, for example, polynucleotides encoding a 
polypeptide variant of a given reference polypeptide are polynucleotides that differ from the 
reference polypeptide but generally maintain their functional characteristics of the reference 
polypeptide. A variant of a polynucleotide may be a naturally occurring allelic variant or it 
may be a variant that is known naturally not to occur. Such non-naturally occurring variants 
of the reference polynucleotide can be made by, for example, mutagenesis techniques, 
including those mutagenesis techniques that are applied to polynucleotides, cells or 
organisms. 

[0078] Generally, differences are limited so that the nucleotide sequences of the 
reference and variant are closely similar overall and, in many regions identical. 

[007 9] Variants of polynucleotides according to the present invention include, but are not 
limited to, nucleotide sequences which are at least 95% identical after alignment to the 
reference polynucleotide encoding the reference polypeptide. These variants can also have 
96%, 97%, 98% and 99.999% sequence identity to the reference polynucleotide. 

[0080] Nucleotide changes present in a variant polynucleotide may be silent, which 
means that these changes do not alter the amino acid sequences encoded by the reference 
polynucleotide. 

[0081] Substitutions, additions and/or deletions can involve one or more nucleic acids. 
Alterations can produce conservative or non-conservative amino acid substitutions, deletions 
and/or additions. 

[0 0 82] Variants of a prey or a SID® polypeptide encoded by a variant polynucleotide can 
possess a higher affinity of binding and/or a higher specificity of binding to its protein or 
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polypeptide counterpart, against which it has been initially selected. In another context, 
variants can also loose their ability to bind to their protein or polypeptide counterpart. 
[0 083] By "anabolic pathway" is meant a reaction or series of reactions in a metabolic 
pathway that synthesize complex molecules from simpler ones, usually requiring the input of 
energy. An anabolic pathway is the opposite of a catabolic pathway. 
[0084] As used herein, a "catabolic pathway" is a series of reactions in a metabolic 
pathway that break down complex compounds into simpler ones, usually releasing energy in 
the process. A catabolic pathway is the opposite of an anabolic pathway. 
[0085] As used herein, "drug metabolism" is meant the study of how drugs are 
processed and broken down by the body. Drug metabolism can involve the study of 
U enzymes that break down drugs, the study of how different drugs interact within the body 

Z and how diet and other ingested compounds affect the way the body processes drugs. 

Jp [0086] As used herein, "metabolism" means the sum of all of the enzyme-catalyzed 

% reactions in living cells that transform organic molecules. 

m [0087] By "secondary metabolism" is meant pathways producing specialized metabolic 

products that are not found in every cell. 
O [0088] As used herein, "SID®" means a Selected Interacting Domain and is identified as 

follows: for each bait polypeptide screened, selected prey polypeptides are compared. 
U Overlapping fragments in the same ORF or CDS define the selected interacting domain. 

[0089] As used herein the term "PIM®" means a protein-protein interaction map. This 

111 

map is obtained from data acquired from a number of separate screens using different bait 
polypeptides and is designed to map out all of the interactions between the polypeptides. 

[0090] The term "affinity of binding", as used herein, can be defined as the affinity 
constant Ka when a given SID® polypeptide of the present invention which binds to a 
polypeptide and is the following mathematical relationship: 

[0091] [SI D®/polypeptide complex] 

[0092] Ka = - 

[0093] [free SID®] [free polypeptide] 

[0094] wherein [free SID®], [free polypeptide] and [SID®/polypeptide complex] consist 
of the concentrations at equilibrium respectively of the free SID® polypeptide, of the free 
polypeptide onto which the SID® polypeptide binds and of the complex formed between 
SID® polypeptide and the polypeptide onto which said SID® polypeptide specifically binds. 
[0095] The affinity of a SID® polypeptide of the present invention or a variant thereof for 
its polypeptide counterpart can be assessed, for example, on a Biacore™ apparatus 
marketed by Amersham Pharmacia Biotech Company such as described by Szabo et al Curr 
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Op/n Sfri/cf e/ol 5 pgs. 699-705 (1995) and by Edwards and Leartherbarrow, Anal. Biochem 
246 pgs. 1-6 (1997). 

[0 096] As used herein the phrase "at least the same affinity" with respect to the binding 
affinity between a SID® polypeptide of the present invention to another polypeptide means 
that the Ka is identical or can be at least two-fold, at least three-fold or at least five fold 
greater than the Ka value of reference. 

[0097] As used herein, the term "modulating compound" means a compound that 
inhibits or stimulates or can act on another protein which can inhibit or stimulate the protein- 
protein interaction of a complex of two polypeptides or the protein-protein interaction of two 
polypeptides. 

[0098] More specifically, the present invention comprises complexes of polypeptides or 
polynucleotides encoding the polypeptides composed of a bait polypeptide, or a bait 
polynucleotide encoding a bait polypeptide and a prey polypeptide or a prey polynucleotide 
encoding a prey polypeptide. The prey polypeptide or prey polynucleotide encoding the prey 
polypeptide is capable of interacting with a bait polypeptide of interest in various hybrid 
systems. 

[0099] As described in the Background of the present invention there are various 
methods known in the art to identify prey polypeptides that interact with bait polypeptides of 
interest. These methods, include, but are not limited to, generic two-hybrid systems as 
described by Fields et al in Nature, 340:245-246 (1989) and more specifically in U.S. Patent 
Nos. 5,283,173, 5,468,614 and 5,667,973, which are hereby incorporated by reference; the 
reverse two-hybrid system described by Vidal et al, supra; the two plus one hybrid method 
described, for example, in Tirode et al, supra; the yeast forward and reverse 'n'-hybrid 
systems as described in Vidal and Legrain, supra; the method described in WO 99/42612; 
those methods described in Legrain et al FEBS Letters 480 pgs. 32-36 (2000) and the like. 
[0100] The present invention is not limited to the type of method utilized to detect 
protein-protein interactions and therefore any method known in the art and variants thereof 
can be used. It is however better to use the method described in WO 99/42612 or WO 
00/66722, both references incorporated herein by reference due to the methods' sensitivity, 
reproducibility and reliability. 

[0101] Protein-protein interactions can also be detected using complementation assays 
such as those described by Pelletier et al. at 
http://www.abrf.org/JBT/ArticIes/JBT001 2/ibt001 2.html . WO 00/07038 and WO98/34120. 
[0102] Although the above methods are described for applications in the yeast system, 
the present invention is not limited to detecting protein-protein interactions using yeast, but 
also includes similar methods that can be used in detecting protein-protein interactions in, for 
example, mammalian systems as described, for example in Takacs et al., Proc. Natl. Acad. 
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Sci., USA, 90 (21 ):1 0375-79 (1993) and Vasavada et al., Proc. Natl. Acad. Sci, USA, 88 
(23): 10686-90 (1991), as well as a bacterial two-hybrid system as described in Karimova et 
al (1998), W099/28746, WO 00/66722 and Legrain et al FEBS Letters, 480 pgs. 32-36 
(2000). 

[0103] The above-described methods are limited to the use of yeast, mammalian cells 
and Escherichia coli cells, the present invention is not limited in this manner. Consequently, 
mammalian and typically human cells, as well as bacterial, yeast, fungus, insect, nematode 
and plant cells are encompassed by the present invention and may be transfected by the 
nucleic acid or recombinant vector as defined herein. 

[0104] Examples of suitable cells include, but are not limited to, VERO cells, HELA cells 
M such as ATCC No. CCL2, CHO cell lines such as ATCC No. CCL61, COS cells such as 

COS-7 cells and ATCC No. CRL 1650 cells, W138, BHK, HepG2, 3T3 such as ATCC No. 
1 CRL6361, A549, PC12, K562 cells, 293 cells, Sf9 cells such as ATCC No. CRL1711 and 

% Cv1 cells such as ATCC No. CCL70. 

m [0105] Other suitable cells that can be used in the present invention include, but are not 

limited to, prokaryotic host cells strains such as Escherichia coli, (e.g., strain DH5-a), 
O Bacillus subtilis, Salmonella typhimurium, or strains of the genera of Pseudomonas, 

IJL 

U Streptomyces and Staphylococcus. 

j=* [0106] Further suitable cells that can be used in the present invention include yeast cells 

such as those of Saccharomyces such as Saccharomyces cerevisiae. 
[0107] The bait polynucleotide, as well as the prey polynucleotide can be prepared 
according to the methods known in the art such as those described above in the publications 
and patents reciting the known method perse. 

[0108] The bait polynucleotide of the present invention is obtained from Shigella flexneri 
(see Table I). The prey polynucleotide is obtained form a human placenta cDNA or variants 
thereof and fragments from the genome or transcriptome of human placenta ranging from 
about 12 to about 5,000, or about 12 to about 10,000 or from about 12 to about 20,000. The 
prey polynucleotide is then selected, sequenced and identified. 

[0109] A human placenta cDNA prey library is prepared from global human placenta and 
constructed in the specially designed prey vector pP6 as shown in Figure 10 after ligation of 
suitable linkers such that every cDNA fragment insert is fused to a nucleotide sequence in 
the vector that encodes the transcription activation domain of a reporter gene. Any 
transcription activation domain can be used in the present invention. Examples include, but 
are not limited to, Gal4,YP16, B42, His and the like. Toxic reporter genes, such as CAT R , 
CYH2, CYH1, URA3, bacterial and fungi toxins and the like can be used in reverse two- 
hybrid systems. 
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[0110] The polypeptides encoded by the nucleotide inserts of the human placenta cDNA 
prey library thus prepared are termed "prey polypeptides" in the context of the presently 
described selection method of the prey polynucleotides. 

[0111] The bait polynucleotide can be inserted in bait plasmid pB6 or pB20 as illustrated 
in Figure 3 or 6 respectively. The bait polynucleotide insert is fused to a polynucleotide 
encoding the binding domain of, for example, the Gal4 DNA binding domain and the shuttle 
expression vector is used to transform cells. The bait polynucleotides used in the present 
invention are describes in Table I. As stated above, any cells can be utilized in transforming 
the bait and prey polynucleotides of the present invention including mammalian cells, 
bacterial cells, yeast cells, insect cells and the like. 

[0112] In an embodiment, the present invention identifies protein-protein interactions in 
yeast. In using known methods a prey positive clone is identified containing a vector which 
comprises a nucleic acid insert encoding a prey polypeptide which binds to a bait 
polypeptide of interest. The method in which protein-protein interactions are identified 
comprises the following steps: 

[0113] mating at least one first haploid recombinant yeast cell clone from a recombinant 
yeast cell clone library that has been transformed with a plasmid containing the prey 
polynucleotide to be assayed with a second haploid recombinant yeast cell clone 
transformed with a plasmid containing a bait polynucleotide encoding for the bait 
polypeptide; 

[0114] cultivating diploid cell clones obtained in step i) on a selective medium; and 
[0115] selecting recombinant cell clones which grow on the selective medium. 
[0116] This method may further comprise the step of: 

[0117] iv) characterizing the prey polynucleotide contained in each recombinant cell 
clone which is selected in step iii). 

[0118] In yet another embodiment of the present invention, in lieu of yeast, Escherichia 
coli is used in a bacterial two-hybrid system, which encompasses a similar principle to that 
described above for yeast, but does not involve mating for characterizing the prey 
polynucleotide. 

[0119] In yet another embodiment of the present invention, mammalian cells and a 
method similar to that described above for yeast for characterizing the prey polynucleotide 
are used. 

[0120] By performing the yeast, bacterial or mammalian two-hybrid system it is possible 
to identify for one particular bait an interacting prey polypeptide. The prey polypeptide that 
has been selected by testing the library of preys in a screen using the two-hybrid, two plus 
one hybrid methods and the like, encodes the polypeptide interacting with the protein of 
interest. 
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[0121] The present invention is also directed, in a general aspect, to a complex of 
polypeptides, polynucleotides encoding the polypeptides composed of a bait polypeptide or 
bait polynucleotide encoding the bait polypeptide and a prey polypeptide or prey 
polynucleotide encoding the prey polypeptide capable of interacting with the bait polypeptide 
of interest. These complexes are identified in Table II, as the bait amino acid sequences 
and the prey amino acid sequences, as well as the bait and prey nucleic acid sequences. 
[0122] In another aspect, the present invention relates to a complex of polynucleotides 
consisting of a first polynucleotide, or a fragment thereof, encoding a prey polypeptide that 
interacts with a bait polypeptide and a second polynucleotide or a fragment thereof. This 
fragment has at least 12 consecutive nucleotides, but can have between 12 and 5,000 
consecutive nucleotides, or between 12 and 10,000 consecutive nucleotides or between 12 
and 20,000 consecutive nucleotides. 

[0123] The polypeptides of column 1 and 3 from Table II according to the present 
invention and the complexes of these two polypeptides also form part of the present 
invention. More specifically, the polypeptides of SEQ ID NOS. 1 to 7 are part of the present 
invention and their complexes with the polypeptides of Column 3, Table II. 
[0124] In yet another embodiment, the present invention relates to an isolated complex 
of at least two polypeptides encoded by two polynucleotides wherein said two polypeptides 
are associated in the complex by affinity binding and are depicted in columns 1 and 3 of 
Table II. 

[0125] In yet another embodiment, the present invention relates to an isolated complex 
comprising at least a polypeptide as described in column 1 of Table II and a polypeptide as 
described in column 3 of Table II. The present invention is not limited to these polypeptide 
complexes alone but also includes the isolated complex of the two polypeptides in which 
fragments and/or homologous polypeptides exhibiting at least 95% sequence identity, as 
well as from 96% sequence identity to 99.999% sequence identity. 

[0126] Also encompassed in another embodiment of the present invention is an isolated 
complex in which SID® of the prey polypeptides encoded by SEQ ID Nos. 15 to 215 in Table 
III form the isolated complex. 

[0127] Besides the isolated complexes described above, nucleic acids coding for a 
Selected Interacting Domain (SID®) polypeptide or a variant thereof or any of the nucleic 
acids set forth in Table III can be inserted into an expression vector which contains the 
necessary elements for the transcription and translation of the inserted protein-coding 
sequence. Such transcription elements include a regulatory region and a promoter. Thus, 
the nucleic acid which may encode a marker compound of the present invention is operably 
linked to a promoter in the expression vector. The expression vector may also include a 
replication origin. 
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[0128] A wide variety of host/expression vector combinations are employed in 
expressing the nucleic acids of the present invention. Useful expression vectors that can be 
used include, for example, segments of chromosomal, non-chromosomal and synthetic DNA 
sequences. Suitable vectors include, but are not limited to, derivatives of SV40 and pcDNA 
and known bacterial plasmids such as col El, pCR1, pBR322, pMal-C2, pET, pGEX as 
described by Smith et al [need cite 1988], pMB9 and derivatives thereof, plasmids such as 
RP4, phage DNAs such as the numerous derivatives of phage I such as NM989, as well as 
other phage DNA such as M13 and filamentous single stranded phage DNA; yeast plasmids 
such as the 2 micron plasmid or derivatives of the 2m plasmid, as well as centomeric and 
integrative yeast shuttle vectors; vectors useful in eukaryotic cells such as vectors useful in 
M insect or mammalian cells; vectors derived from combinations of plasmids and phage DNAs, 

t such as plasmids that have been modified to employ phage DNA or the expression control 

=p sequences; and the like. 

I 5 [0129] For example in a baculovirus expression system, both non-fusion transfer 

00 vectors, such as, but not limited to pVL941 (BamHI cloning site Summers, pVL1393 (BamHI, 

Sma\, Xba\, EcoR\, Nott, XmaU\, Bglll and Psfi cloning sites; Invitrogen) pVL1392 {Bg\\\, Pstt, 

0 A/ofl, Xma\\\, EcoR\, XbaH, Sma\ and BamHI cloning site; Summers and Invitrogen) and 
C pBlueBaclll (BamHI, BglU, Psfl, Nco\ and HindW cloning site, with blue/white recombinant 
M screening, Invitrogen), and fusion transfer vectors such as, but not limited to, pAc700(BamHI 

1 and Kpn\ cloning sites, in which the BamHI recognition site begins with the initiation codon; 
Summers), pAc701 and pAc70-2 (same as pAc700, with different reading frames), pAc360 
(BamHI cloning site 36 base pairs downstream of a polyhedrin initiation codon; Invitrogen 
(195)) and pBlueBacHisA, B, C ( three different reading frames with BamHI, BglU, Psfl, Nco\ 
and Hind\\\ cloning site, an N-terminal peptide for ProBond purification and blue/white 
recombinant screening of plaques; Invitrogen (220) can be used. 

[0130] Mammalian expression vectors contemplated for use in the invention include 
vectors with inducible promoters, such as the dihydrofolate reductase promoters, any 
expression vector with a DHFR expression cassette or a DHFR/methotrexate co- 
amplification vector such as pED (Psfl, Sa/I, Sbal, Smal and EcoRI cloning sites, with the 
vector expressing both the cloned gene and DHFR; Kaufman, 1991). Alternatively a 
glutamine synthetase/methionine sulfoximine co-amplification vector, such as pEE14 
(HindU\, Xbal\, Smal, Sba\, EcoR\ and Bc/I cloning sites in which the vector expresses 
glutamine synthetase and the cloned gene; Celltech). A vector that directs episomal 
expression under the control of the Epstein Barr Virus (EBV) or nuclear antigen (EBNA) can 
be used such as pREP4 (BamHI, SfH, Xho\, Not\, Nhe\, HindlU, Nhe\, PvuU and Kpn\ cloning 
sites, constitutive RSV-LTR promoter, hygromycin selectable marker; Invitrogen) pCEP4 
(BamHI, SfA, Xho\, Not\, Nhe\, Hind\\\, Nhe\, PvuU and Kpn\ cloning sites, constitutive hCMV 
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immediate early gene promoter, hygromycin selectable marker; Invitrogen), pMEP4 (Kpn\, 
Pvu\, Nhe\, HincllU, Noti, Xho\, Sfi\, BamH\ cloning sites, inducible methallothionein lla gene 
promoter, hygromycin selectable marker, Invitrogen), pREP8 (BamHI, Xho\, Noti, HindW, 
Nhe\ and Kpn\ cloning sites, RSV-LTR promoter, histidinol selectable marker; Invitrogen), 
pREP9 (Kpn\, Nhe\, Hind\\\, Noti, Xho\, Sffl, BamH\ cloning sites, RSV-LTR promoter, G418 
selectable marker; Invitrogen), and pEBVHis (RSV-LTR promoter, hygromycin selectable 
marker, N-terminal peptide purifiable via ProBond resin and cleaved by enterokinase; 
Invitrogen). 

[0131] Selectable mammalian expression vectors for use in the invention include, but 
are not limited to, pRc/CMV {HindW, BstX\, Noti, Sba\ and Apa\ cloning sites, G418 
selection, Invitrogen), pRc/RSV (Hind\, Spe\, BstX\, Noti, Xba\ cloning sites, G41 8 selection, 
Invitrogen) and the like. Vaccinia virus mammalian expression vectors (see, for example 
Kaufman 1991 that can be used in the present invention include, but are not limited to, 
pSC11 {Sma\ cloning site, TK- and B-gal selection), pMJ601 (Sail, Sma\, AfH, Nati, BspMU, 
BamH\, Apa\, Nhe\, Sactt, Kpn\ and HindW cloning sites; TK- and p-gal selection), 
pTKgptFlS (EcoRI, Psti, SaH\, Acc\, Hindi Sba\, BamH\ and Hpa cloning sites, TK or XPRT 
selection) and the like. 

[0132] Yeast expression systems that can also be used in the present include, but are 
not limited to, the non-fusion pYES2 vector (Xba\, Sph\, Sho\, Noti, GstXl EcoRI, BsfXI, 
BamH\, Sad, Kpn\ and HindW cloning sites, Invitrogen), the fusion pYESHisA, B, C (XbaH, 
Sph\, Sho\, Noti, BsfXi, EcoH\, BamH\, Sad, Kpn\ and HindW cloning sites, N-terminal 
peptide purified with ProBond resin and cleaved with enterokinase; Invitrogen), pRS vectors 
and the like. 

[0133] Consequently, mammalian and typically human cells, as well as bacterial, yeast, 
fungi, insect, nematode and plant cells an used in the present invention and may be 
transfected by the nucleic acid or recombinant vector as defined herein. 
[0134] Examples of suitable cells include, but are not limited to, VERO cells, HELA cells 
such as ATCC No. CCL2, CHO cell lines such as ATCC No. CCL61 , COS cells such as 
COS-7 cells and ATCC No. CRL 1650 cells, W138, BHK, HepG2, 3T3 such as ATCC No. 
CRL6361, A549, PC12, K562 cells, 293 cells, Sf9 cells such as ATCC No. CRL1711 and 
Cv1 cells such as ATCC No. CCL70. 

[0135] Other suitable cells that can be used in the present invention include, but are not 
limited to, prokaryotic host cells strains such as Escherichia coli, (e.g., strain DH5-a), 
Bacillus subtilis, Salmonella typhimurium, or strains of the genera of Pseudomonas, 
Streptomyces and Staphylococcus. 

[0136] Further suitable cells that can be used in the present invention include yeast cells 
such as those of Saccharomyces such as Saccharomyces cerevisiae. 
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[0137] Besides the specific isolated complexes, as described above, the present 
invention relates to and also encompasses SID® polynucleotides. As explained above, for 
each bait polypeptide, several prey polypeptides may be identified by comparing and 
selecting the intersection of every isolated fragment that are included in the same 
polypeptide. Thus the SID® polynucleotides of the present invention are represented by the 
shared nucleic acid sequences of SEQ ID Nos. 15 to 215 encoding the SID® polypeptides of 
SEQ ID Nos. 216 to 416 in columns 5 and 7 of Table III, respectively. 
[013 8] The present invention is not limited to the SID® sequences as described in the 
above paragraph, but also includes fragments of these sequences having at least 12 
consecutive nucleic acids, between 12 and 5,000 consecutive nucleic acids and between 12 
and 10,000 consecutive nucleic acids and between 12 and 20,000 consecutive nucleic 
acids, as well as variants thereof. The fragments or variants of the SID® sequences 
possess at least the same affinity of binding to its protein or polypeptide counterpart, against 
which it has been initially selected. Moreover this variant and/or fragments of the SID® 
sequences alternatively can have between 95% and 99.999% sequence identity to its 
protein or polypeptide counterpart. 

[013 9] According to the present invention the variants can be created by known 
mutagenesis techniques either in vitro or in vivo. Such a variant can be created such that it 
has altered binding characteristics with respect to the target protein and more specifically 
that the variant binds the target sequence with either higher or lower affinity. 

[0140] Polynucleotides that are complementary to the above sequences which include 
the polynucleotides of the SID®'s, their fragments, variants and those that have specific 
sequence identity are also included in the present invention. 

[0141] The polynucleotide encoding the SID® polypeptide, fragment or variant thereof 
can also be inserted into recombinant vectors which are described in detail above. 

[0142] The present invention also relates to a composition comprising the above- 
mentioned recombinant vectors containing the SID® polypeptides in Table III, fragments or 
variants thereof, as well as recombinant host cells transformed by the vectors. The 
recombinant host cells that can be used in the present invention were discussed in greater 
detail above. 

[0143] The compositions comprising the recombinant vectors can contain physiological 
acceptable carriers such as diluents, adjuvants, excipients and any vehicle in which this 
composition can be delivered therapeutically and can include, but is are not limited to sterile 
liquids such as water and oils. 

[0144] In yet another embodiment, the present invention relates to a method of selecting 
modulating compounds, as well as the modulating molecules or compounds themselves 
which may be used in a pharmaceutical composition. These modulating compounds may 
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act as a cofactor, as an inhibitor, as antibodies, as tags, as a competitive inhibitor, as an 
activator or alternatively have agonistic or antagonistic activity on the protein-protein 
interactions. 

[0145] The activity of the modulating compound does not necessarily, for example, have 
to be 100% activation or inhibition. Indeed, even partial activation or inhibition can be 
achieved that is of pharmaceutical interest. 

[0146] The modulating compound can be selected according to a method which 
comprises: 

[0147] cultivating a recombinant host cell with a modulating compound on a selective 
medium and a reporter gene the expression of which is toxic for said recombinant host cell 
M wherein said recombinant host cell is transformed with two vectors: 

r [0148] wherein said first vector comprises a polynucleotide encoding a first hybrid 

=C polypeptide having a DNA binding domain; 

y ~ [0149] wherein said second vector comprises a polynucleotide encoding a second 

03 hybrid polypeptide having a transcriptional activating domain that activates said toxic 

reporter gene when the first and second hybrid polypeptides interact; 
D [0150] selecting said modulating compound which inhibits or permits the growth of said 

recombinant host cell. 

h* [0151] Thus, the present invention relates to a modulating compound that inhibits the 

% protein-protein interactions between Shigella flexneri polypeptide and human placenta 

polypeptide of columns 1 and 3 of Table II, respectively. The present invention also relates 
to a modulating compound that activates the protein-protein interactions between Shigella 
flexneri polypeptide and human placenta polypeptide of columns 1 and 3 of Table II, 
respectively. 

[0152] In yet another embodiment, the present invention relates to a method of selecting 
a modulating compound, which modulating compound inhibits the interaction between 
Shigella flexneri polypeptide and human placenta polypeptide of columns 1 and 3 of Table II, 
respectively. This method comprises: 

(a) cultivating a recombinant host cell with a modulating compound on a selective 

medium and a reporter gene the expression of which is toxic for said recombinant host cell 
wherein said recombinant host cell is transformed with two vectors: 

(i) wherein said first vector comprises a polynucleotide encoding a first hybrid 
polypeptide having a first domain of an enzyme; 

(ii) wherein said second vector comprises a polynucleotide encoding a second 
hybrid polypeptide having an enzymatic transcriptional activating domain that activates said 
toxic reporter gene when the first and second hybrid polypeptides interact; 
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(b) selecting said modulating compound which inhibits or permits the growth of said 

recombinant host cell. 

[0153] In the two methods described above any toxic reporter gene can be utilized 
including those reporter genes that can be used for negative selection including the URA3 
gene, the CYH1 gene, the CYH2 gene and the like. 

[0154] In yet another embodiment, the present invention provides a kit for screening a 
modulating compound. This kit comprises a recombinant host cell which comprises a 
reporter gene the expression of which is toxic for the recombinant host cell. The host cell is 
transformed with two vectors. The first vector comprises a polynucleotide encoding a first 
hybrid polypeptide having a DNA binding domain; and a second vector comprises a 
polynucleotide encoding a second hybrid polypeptide having a transcriptional activating 
domain that activates said toxic reporter gene when the first and second hybrid polypeptides 
interact. 

[0155] In yet another embodiment a kit is provided for screening a modulating 
compound by providing a recombinant host cell, as described in the paragraph above, but 
instead of a DNA binding domain, the first vector comprises a first hybrid polypeptide 
containing a first domain of a protein. The second vector comprises a second polypeptide 
containing a second part of a complementary domain of a protein that activates the toxic 
reporter gene when the first and second hybrid polypeptides interact. 
[0156] In the selection methods described above, the activating domain can be p42 Gal 
4, YP16 (HSV) and the DNA-binding domain can be derived from Gal4 or Lex A. The protein 
or enzyme can be adenylate cyclase, guanylate cyclase, DHFR and the like. 
[0157] Examples of modulating compounds are set forth in Table III. 
[0158] In yet another embodiment, the present invention relates to a pharmaceutical 
composition comprising the modulating compounds for preventing or treating bacillary 
dysentery in a human or animal, most preferably in a mammal. 

[0159] This pharmaceutical composition comprises a pharmaceutical^ acceptable 
amount of the modulating compound. The pharmaceutical^ acceptable amount can be 
estimated from cell culture assays. For example, a dose can be formulated in animal 
models to achieve a circulating concentration range that includes or encompasses a 
concentration point or range having the desired effect in an in vitro system. This information 
can thus be used to accurately determine the doses in other mammals, including humans 
and animals. 

[016 0] The therapeutically effective dose refers to that amount of the compound that 
results in amelioration of symptoms in a patient. Toxicity and therapeutic efficacy of such 
compounds can be determined by standard pharmaceutical procedures in cell cultures or in 
experimental animals. For example, the LD50 (the dose lethal to 50% of the population) as 
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well as the ED50 (the dose therapeutically effective in 50% of the population) can be 
determined using methods known in the art. The dose ratio between toxic and therapeutic 
effects is the therapeutic index which can be expressed as the ratio between LD 50 and 
ED50 compounds that exhibit high therapeutic indexes. 

[0161] The data obtained from the cell culture and animal studies can be used in 
formulating a range of dosage of such compounds which lies preferably within a range of 
circulating concentrations that include the ED50 with little or no toxicity. 
[0162] The pharmaceutical composition can be administered via any route such as 
locally, orally, systemically, intravenously, intramuscularly, mucosally, using a patch and can 
be encapsulated in liposomes, microparticles, microcapsules, and the like. The 
y ; pharmaceutical composition can be embedded in liposomes or even encapsulated. 

[0163] Any pharmaceutical^ acceptable carrier or adjuvant can be used in the 
]| pharmaceutical composition. The modulating compound will be preferably in a soluble form 

W combined with a pharmaceutical^ acceptable carrier. The techniques for formulating and 

p. administering these compounds can be found in "Remington's Pharmaceutical Sciences" 

Mack Publication Co., Easton, PA, latest edition. 
13 [0164] The mode of administration optimum dosages and galenic forms can be 

p determined by the criteria known in the art taken into account the seriousness of the general 

\jl condition of the mammal, the tolerance of the treatment and the side effects. 

[0165] The present invention also relates to a method of treating or preventing bacillary 
dysentery in a human or mammal in need of such treatment. This method comprises 
administering to a mammal in need of such treatment a pharmaceutically effective amount of 
a modulating compound which binds to a targeted Shigella protein. In a preferred 
embodiment, the modulating compound is a polynucleotide which may be placed under the 
control of a regulatory sequence which is functional in the mammal or human. 
[0166] In yet another embodiment, the present invention relates to a pharmaceutical 
composition comprising a SID® polypeptide, a fragment or variant thereof. The SID® 
polypeptide, fragment or variant thereof can be used in a pharmaceutical composition 
provided that it is endowed with highly specific binding properties to a bait polypeptide of 
interest. 



22 



[0167] The original properties of the SID® polypeptide or variants thereof interfere with 
the naturally occurring interaction between a first protein and a second protein within the 
cells of the organism. Thus, the SID® polypeptide binds specifically to either the first 
polypeptide or the second polypeptide. 

[016 8] Therefore, the SID® polypeptides of the present invention or variants thereof 
interfere with protein-protein interactions between Shigella or Escherichia polypeptides or 
between a mammal polypeptide. 

[0169] Thus, the present invention relates to a pharmaceutical composition comprising a 
pharmaceutical^ acceptable amount of a SID® polypeptide or variant thereof, provided that 
the variant has the above-mentioned two characteristics; i.e., that it is endowed with highly 
specific binding properties to a bait polypeptide of interest and is devoid of biological activity 
of the naturally occurring protein. 

[0170] In yet another embodiment, the present invention relates to a pharmaceutical 
composition comprising a pharmaceutical^ effective amount of a polynucleotide encoding a 
SID® polypeptide or a variant thereof wherein the polynucleotide is placed under the control 
of an appropriate regulatory sequence. Appropriate regulatory sequences that are used are 
polynucleotide sequences derived from promoter elements and the like. 
[0171] Polynucleotides that can be used in the pharmaceutical composition of the 
present invention include the nucleotide sequences of SID®s of SEQ ID Nos. 15 to 215. 
[0172] Besides the SID® polypeptides and polynucleotides, the pharmaceutical 
composition of the present invention can also include a recombinant expression vector 
comprising the polynucleotide encoding the SID® polypeptide, fragment or variant thereof. 
[0173] The above described pharmaceutical compositions can be administered by any 
route such as orally, systemically, intravenously, intramuscularly, intradermal^, mucosally, 
encapsulated, using a patch and the like. Any pharmaceutically acceptable carrier or 
adjuvant can be used in this pharmaceutical composition. 

[0174] The SID® polypeptides as active ingredients will be preferably in a soluble form 
combined with a pharmaceutically acceptable carrier. The techniques for formulating and 
administering these compounds can be found in "Remington's Pharmaceutical Sciences" 
supra. 

[0175] The amount of pharmaceutically acceptable SID® polypeptides can be 
determined as described above for the modulating compounds using cell culture and animal 
models. 

[0176] Such compounds can be used in a pharmaceutical composition to treat or 
prevent bacillary dysentery. 

[0177] Thus, the present invention also relates to a method of preventing or treating 
bacillary dysentery in a mamma! said method comprising the steps of administering to a 
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mammal in need of such treatment a pharmaceutical^ effective amount of a recombinant 
expression vector comprising a polynucleotide encoding a SID® polypeptide which binds to 
a either to a Shigella flexneri protein or to a human placenta protein involved in a protein- 
protein interaction between a Shigella flexneri protein and an human placenta protein.More 
specifically, the present invention relates to a method of preventing or treating bacillary 
dysentery in a mammal said method comprising the steps of administering to a mammal in 
need of such treatment a pharmaceutically effective amount of: 

(1) a SID® polypeptide of SEQ ID Nos. 216 to 416 or a variant thereof which binds to a 
targeted Shigella flexneri protein or human placenta protein; or 

(2) a SID® polynucleotide encoding a SID® polypeptide of SEQ ID Nos. 15 to 215 or a 
variant or a fragment thereof wherein said polynucleotide is placed under the control of a 
regulatory sequence which is functional in said mammal; or 

(3) a recombinant expression vector comprising a polynucleotide encoding a SID® 
polypeptide which binds either to a Shigella flexneri protein or to a human placenta protein 
involved in a protein-protein interaction between a Shigella flexneri protein and an human 
placenta protein. 

[0178] In another embodiment the present invention nucleic acids comprising a 
sequence of SEQ ID Nos. 15 to 215 which encodes the protein of sequence SEQ ID Nos. 
216 to 416 and/or functional derivatives thereof are administered to modulate complex ( from 
Table II) function by way of gene therapy. Any of the methodologies relating to gene therapy 
available within the art may be used in the practice of the present invention such as those 
described by Goldspiel et al Clin. Pharm. 12 pgs. 488-505 (1993). 

[0179] Delivery of the therapeutic nucleic acid into a patient may be direct in vivo gene 
therapy (i.e., the patient is directly exposed to the nucleic acid or nucleic acid-containing 
vector) or indirect ex vivo gene therapy (i.e., cells are first transformed with the nucleic acid 
in vitro and then transplanted into the patient). 

[0180] For example for in vivo gene therapy, an expression vector containing the nucleic 
acid is administered in such a manner that it becomes intracellular; i.e., by infection using a 
defective or attenuated retroviral or other viral vectors as described, for example in U.S. 
Patent 4,980,286 or by Robbins et al, Pharmacol. Ther. , 80 No. 1 pgs. 35-47 (1998). 
[0181] The various retroviral vectors that are known in the art are such as those 
described in Miller et al, Meth. Enzymol. 217 pgs. 581-599 (1993) which have been modified 
to delete those retroviral sequences which are not required for packaging of the viral 
genome and subsequent integration into host cell DNA. Also adenoviral vectors can be 
used which are advantageous due to their ability to infect non-dividing cells and such high- 
capacity adenoviral vectors are described in Kochanek, Human Gene Therapy, 10, pgs. 
2451-2459 (1999). Chimeric viral vectors that can be used are those described by Reynolds 
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et al, Molecular Medecine Today, pgs. 25 -31 (1999). Hybrid vectors can also be used and 
are described by Jacoby et al, Gene Therapy, 4, pgs. 1282-1283 (1997). 

[0182] Direct injection of naked DNA or through the use of microparticle bombardment 
(e.g., Gene Gun®; Biolistic, Dupont). or by coating it with lipids can also be used in gene 
therapy. Cell-surface receptors/transfecting agents or through encapsulation in liposomes, 
microparticles or microcapsules or by administering the nucleic acid in linkage to a peptide 
which is known to enter the nucleus or by administering it in linkage to a ligand predisposed 
to receptor-mediated endocytosis ( See, Wu & Wu, J. Biol. Chem, 262 pgs. 4429-4432 ( 
1987)) can be used to target cell types which specifically express the receptors of interest. 

[0183] In another embodiment a nucleic acid ligand compound may be produced in 
which the ligand comprises a fusogenic viral peptide designed so as to disrupt endosomes, 
thus allowing the nucleic acid to avoid subsequent lysosomal degradation. The nucleic acid 
may be targeted in vivo for cell specific endocytosis and expression by targeting a specific 
receptor such as that described in WO92/06180, W093/14188 and WO 93/20221. 
Alternatively the nucleic acid may be introduced intracellular^ and incorporated within the 
host cell genome for expression by homologous recombination. See, Zijlstra et al, Nature, 
342, pgs. 435-428 (1989). 

[0184] In ex vivo gene a gene is transferred into cells in vitro using tissue culture and 
the cells are delivered to the patient by various methods such as injecting subcutaneously, 
application of the cells into a skin graft and the intravenous injection of recombinant blood 
cells such as hematopoietic stem or progenitor cells. 

[0185] Cells into which a nucleic acid can be introduced for the purposes of gene 
therapy include, for example, epithelial cells, endothelial cells, keratinocytes, fibroblasts, 
muscle cells, hepatocytes and blood cells. The blood cells that can be used include, for 
example, T-lymphocytes, B-lymphocytes, monocytes, macrophages, neutrophils, 
eosinophils, megakaryotcytes, granulocytes, hematopoietic cells or progenitor cells and the 
like. 

[0186] In yet another embodiment the present invention relates to protein chips or 
protein microarrays. It is well known in the art that microarrays can contain more than 
10,000 spots of a protein that can be robotically deposited on a surface of a glass slide or 
nylon filter. The proteins attach covalently to the slide surface, yet retain their ability to 
interact with other proteins or small molecules in solution. In some instances the protein 
samples can be made to adhere to glass slides by coating the slides with an aldehyde- 
containing reagent that attaches to primary amines. A process for creating microarrays is 
described, for example by MacBeath and Schreiber in Science, Volume 289, Number 5485, 
pgs, 1760-1763 (2000) or Service, Science, Vol, 289, Number 5485 pg. 1673 (2000). An 
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apparatus for controlling, dispensing and measuring small quantities of fluid is described, for 
example, in U.S. Patent No. 6,112,605. 

[0187] The present invention also provides a record of protein-protein interactions, 
PIM®'s, SID®'s and any data encompassed in the following Tables. It will be appreciated 
that this record can be provided in paper or electronic or digital form. 

[0188] In order to fully illustrate the present invention and advantages thereof, the 
following specific examples are given, it being understood that the same are intended only 
as illustrative and in no way limitative. 
EXAMPLES 

EXAMPLE 1 : Preparation of a collection of random-primed cDNA fragments 
1 .A. Collection preparation and transformation in Escherichia coli 
1 .A.1 . Random-primed cDNA fragment preparation 

[0189] For the human placenta mRNA sample, random-primed cDNA was prepared 
from 5 ug of polyA+ mRNA using a TimeSaver cDNA Synthesis Kit (Amersham Pharmacia 
Biotech) and with 5 pg of random N9-mers according to the manufacturer's instructions. 
Following phenolic extraction, the cDNA was precipitated and resuspended in water. The 
resuspended cDNA was phosphorylated by incubating in the presence of T4 DNA Kinase 
(Biolabs) and ATP for 30 minutes at 37°C. The resulting phosphorylated cDNA was then 
purified over a separation column (Chromaspin TE 400, Clontech), according to the 
manufacturer's protocol. 
1 .A.2. Ligation of linkers to blunt-ended cDNA 

Oligonucleotide HGX931 (5' end phosphorylated) 1 p,g/p.l and HGX932 1ug/nl. 
Sequence of the oligo HGX931: 5'-GGGCCACGAA-3' (SEQ ID NO. 417) 
Sequence of the oligo HGX932 : 5'-TTCGTGGCCCCTG-3' (SEQ ID NO. 418) 
[019 0] Linkers were preincubated (5 minutes at 95°C, 10 minutes at 68°C, 15 minutes at 
42°C) then cooled down at room temperature and ligated with cDNA fragments at 16°C 
overnight. 

[0191] Linkers were removed on a separation column (Chromaspin TE 400, Clontech), 
according to the manufacturer's protocol. 
1 .A.3. Vector preparation 

[0192] Plasmid pP6 (see Figure 10) was prepared by replacing the Spel/Xhol fragment 
of pGAD3S2X with the double-stranded oligonucleotide: 

5'CTAGCCATGGCCGCAGGGGCCGCGGCCGCACTAGTGGGGATCCTTAATTAAAGGGC 
CACTGGGGCCCCC 

GGTACCGGCGTCCCCGGCGCCGGCGTGATCACCCCTAGGAATTAATTTCCCGGTGAC 
CCCGGGGGAGCT 3' (SEQ ID NO. 419) 
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[0193] The pP6 vector was successively digested with Sfil and BamH\ restriction 
enzymes (Biolabs) for 1 hour at 37°C, extracted, precipitated and resuspended in water. 
Digested plasmid vector backbones were purified on a separation column 
(Chromaspin TE 400, Clontech), according to the manufacturer's protocol. 
1 .A.4. Ligation between vector and insert of cDNA 

[0194] The prepared vector was ligated overnight at 15°C with the blunt-ended cDNA 
described in section 2 using T4 DNA ligase (Biolabs). The DNA was then precipitated and 
resuspended in water. 

1 .A.5. Library transformation in Escherichia coli 

[0195] The DNA from section 1.A.4 was transformed into Electromax DH10B 
M= electrocompetent cells (Gibco BRL) with a Cell Porator apparatus (Gibco BRL). 1 ml SOC 

^ medium was added and the transformed cells were incubated at 37°C for 1 hour. 9 mis of 

y 

i SOC medium per tube was added and the cells were plated on LB+ampicillin medium. The 

^ colonies were scraped with liquid LB medium, aliquoted and frozen at -80°C. 

m [0196] The obtained collection of recombinant cell clones is named HGXBPLARP1 . 

1 .B. Collection transformation in Saccharomyces cerevisiae 
D [0197] The Saccharomyces cerevisiae strain (Y187 (MATcc Gal4A Gal80A ade2-101, 

his3, leu2-3, -112, trp1-901, ura3-52 URA3::UASGAL1-LacZ Met)) was transformed with the 
M cDNA library. 

[0198] The plasmid DNA contained in E. coli were extracted (Qiagen) from aliquoted E. 

coli frozen cells (1.A.5.). Saccharomyces cerevisiae yeast Y187 in YPGIu were grown. 

[0199] Yeast transformation was performed according to standard protocol (Giest et al. 

Yeast, 11, 355-360, 1995) using yeast carrier DNA (Clontech). This experiment leads to 10 4 

to 5 x 10 4 cel!s/|ug DNA. 2 x 10 4 cells were spread on DO-Leu medium per plate. The cells 

were aliquoted into vials containing 1 ml of cells and frozen at -80°C. 

[02 00] The obtained collection of recombinant cell clones is named HGXYPLARP1 

(placenta). 

1 .C. Construction of bait plasmids 

[02 01] For fusions of the bait protein (listed in Table II) to the DNA-binding domain of the 
GAL4 protein of S. cerevisiae, bait fragments were cloned into plasmid pB6. For fusions of 
the bait protein to the DNA-binding domain of the LexA protein of E. coli, bait fragments were 
cloned into plasmid pB20. 

[02 02] Plasmid pB6 (see Figure 3) was prepared by replacing the Ncol/Sall polylinker 
fragment of pASAA with the double-stranded DNA fragment: 

5' 

CATGGCCGGACGGGCCGCGGCCGCACTAGTGGGGATCCTTAATTAAAGGGCCACTGG 
GGCCCCC 3' (SEQ ID NO. 420) 
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3' 

CGGCCTGCCCGGCGCCGGCGTGATCACCCCTAGGAATTAATTTCCCGGTGACCCCGG 
GGGAGCT 5' (SEQ ID NO. 421) 

[0203] Plasmid pB20 (see Figure 6) was prepared by replacing the EcoR\Pstl polylinker 
fragment of pLexlO with the double-stranded DNA fragment: 

5' 

AATTCGGGGCCGGACGGGCCGCGGCCGCACTAGTGGGGATCCTTAATTAAGGGCCAC 

TGGGGCCCCTCGACCTGCA 3' (SEQ ID NO. 422) 

3' 

GCCCCGGCCTGCCCGGCGCCGGCGTGATCACCCCTAGGAATTAATTCCCGGTGACCC 
CGGGGAGCTGG 5' (SEQ ID NO. 423) 

[02 04] The amplification of the bait ORF was obtained by PCR using the Pfu proof- 
reading Taq polymerase (Stratagene), 10pmol of each specific amplification primer and 
200 ng of plasmid DNA as template. 

[0205] The PCR program was set up as follows : 

94° 45" 
9> 45" 

415° 45" x30 cycles 

7;>° 6' 
72° 10' 
15° oo 

[02 06] The amplification was checked by agarose gel electrophoresis. 
[0207] The PCR fragments were purified with Qiaquick column (Qiagen) according to 
the manufacturer's protocol. 

[02 08] Purified PCR fragments were digested with adequate restriction enzymes. The 
PCR fragments were purified with Qiaquick column (Qiagen) according to the manufacturer's 
protocol. 

[02 09] The digested PCR fragments were ligated into an adequately digested and 
dephosphorylated bait vector (pB6 or pB20) according to standard protocol (Sambrook era/.) 
and were transformed into competent bacterial cells. The cells were grown, the DNA 
extracted and the plasmid was sequenced. 

Example 2 : Screening the collection with the two-hybrid in yeast system 
2.A. The mating protocol 

[0210] The mating two-hybrid in yeast system (as described by Legrain et al., Nature 
Genetics, vol. 16, 277-282 (1997), Toward a functional analysis of the yeast genome through 
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exhaustive two-hybrid screens) was used for its advantages but one could also screen the 
cDNA collection in classical two-hybrid system as described in Fields et at. or in a yeast 
reverse two-hybrid system. 

[0211] The mating procedure allows a direct selection on selective plates because the 
two fusion proteins are already produced in the parental cells. No replica plating is required. 
[0212] This protocol was written for the use of the library transformed into the Y187 
strain. 

[0213] For bait proteins fused to the DNA-binding domain of GAL4, bait-encoding 
plasmids were first transformed into S. cerevisiae (CG1945 strain (MATa Gal4-542 Gal 180- 
538 ade2-101 his3A200, Ieu2-3,112, trp1-901, ura3-52, Iys2-801, URA3::GAL4 17mers (X3)- 
CyC1TATA-LacZ, LYS2::GAL1 UAS-GAL1TATA-HIS3 CYH R )) according to step 1.B. and 
spread on DO-Trp medium. 

[0214] For bait proteins fused to the DNA-binding domain of LexA, bait-encoding 
plasmids were first transformed into S. cerevisiae (L40Agal4 strain (MATa ade2, trp1-901, 
Ieu2 3,112, Iys2-801, his3A200, LYS2::(lexAop) 4 -HIS3, ura3-52::URA3 (lexAop) 8 -LacZ, 
GAL4::Kan R )) according to step 1.B. and spread on DO-Trp medium. 
Day 1 , morning : preculture 

[0215] The cells carrying the bait plasmid obtained at step 1.C. were precultured in 
20 ml DO-Trp medium and grown at 30°C with vigorous agitation. 
Day 1 , late afternoon : culture 

[0216] The OD 600nm of the DO-Trp pre-culture of cells carrying the bait plasmid pre- 
culture was measured. The OD 60 onm must lie between 0.1 and 0.5 in order to correspond to a 
linear measurement.50 ml DO-Trp at OD600nm 0.006/ml was inoculated and grown 
overnight at 30°C with vigorous agitation. 
Day 2 : mating 
medium and plates 

1 YPGIu 15cm plate 
50 ml tube with 13 ml DO-Leu-Trp-His 
100 ml flask with 5 ml of YPGIu 
8 DO-Leu-Trp-His plates 

2 DO-Leu plates 
2 DO-Trp plates 
2 DO-Leu-Trp plates 

[0217] The OD600nm of the DO-Trp culture was measured. It should be around 1. 
[0218] For the mating, twice as many bait cells as library cells were used. To get a good 
mating efficiency, one must collect the cells at 10 8 cells per cm 2 . 
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[0219] The amount of bait culture (in ml) that makes up 50 OD600nm units for the 
mating with the prey library was estimated. 

[0220] A vial containing the HGXYCDNA1 library was thawed slowly on ice. 1.0ml of the 
vial was added to 5 ml YPGIu. Those cells were recovered at 30°C, under gentle agitation 
for 10 minutes. 
Mating 

[0221] The 50 OD600nm units of bait culture was placed into a 50 ml falcon tube. 
[0222] The HGXYCDNA1 library culture was added to the bait culture, then centrifuged, 
the supernatant discarded and resuspended in 1 .6ml YPGIu medium. 
[022 3] The cells were distributed onto two 15cm YPGIu plates with glass beads. The 
cells were spread by shaking the plates. The plate cells-up at 30°C for 4h30min were 
incubated. 

Collection of mated cells 

[0224] The plates were washed and rinsed with 6ml and 7ml respectively of DO-Leu- 
Trp-His. Two parallel serial ten-fold dilutions were performed in 500ul DO-Leu-Trp-His up to 
1/10,000. 50ul of each 1/10000 dilution was spread onto DO-Leu and DO-trp plates and 
50ul of each 1/1000 dilution onto DO-Leu-Trp plates. 22.4ml of collected cells were spread 
in 400pl aliquots on DO-Leu-Trp-His+Tet plates. 
Day 4 

[0225] Clones that were able to grow on DO-Leu-Trp-His+Tetracyclin were then 
selected. This medium allows one to isolate diploid clones presenting an interaction. 
[022 6] The His+ colonies were counted on control plates. 

[0227] The number of His+ cell clones will define which protocol is to be processed : 
[022 8] Upon 60. 10 6 Trp+Leu+ colonies : 

- if the number His+ cell clones <285 : then use the process luminometry protocol on all 
colonies 

- if the number of His+ cell clones > 285 and <5000: then process via overlay and then 
luminometry protocols on blue colonies (2.B and 2.C). 

- if number of His+ cell clones >5000 : repeat screen using DO-Leu-Trp-His+Tetracyclin 
plates containing 3-aminotriazol. 

2.B. The X-Gal overlay assay 

[022 9] The X-Gal overlay assay was performed directly on the selective medium plates 
after scoring the number of His + colonies. 
Materials 

[0230] A waterbath was set up. The water temperature should be 50°C. 
0.5 M Na 2 HP0 4 pH 7.5. 
1 .2% Bacto-agar. 
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2% X-Gal in DMF. 

Overlay mixture : 0.25 M Na 2 HP0 4 pH7.5, 0.5% agar, 0.1% SDS, 7% DMF (LABOSI), 0.04% 

X-Gal (ICN). For each plate, 10 ml overlay mixture are needed. 

DO-Leu-Trp-His plates. 

Sterile toothpicks. 

Experiment 

[0231] The temperature of the overlay mix should be between 45°C and 50°C. The 
overlay-mix was poured over the plates in portions of 10 ml. When the top layer was settled, 
they were collected. The plates were incubated overlay-up at 30°C and the time was noted. 
Blue colonies were checked for regularly. If no blue colony appeared, overnight incubation 
was performed. Using a pen the number of positives was marked. The positives colonies 
were streaked on fresh DO-Leu-Trp-His plates with a sterile toothpick. 

2. C. The luminometry assay 

[0232] His+ colonies were grown overnight at 30°C in microtiter plates containing DO- 
Leu-Trp-His+Tetracyclin medium with shaking. The day after, the overnight culture was 
diluted 15 times into a new microtiter plate containing the same medium and was incubated 
for 5 hours at 30°C with shaking. The samples were diluted 5 times and read OD 600nm . The 
samples were diluted again to obtain between 10,000 and 75,000 yeast cells/well in 100 jul 
final volume. 

[0233] Per well, 76 nl of One Step Yeast Lysis Buffer (Tropix) was added, 20 jjj 
Sapphirell Enhancer (Tropix), 4 (J Galacton Star (Tropix) and incubated 40 minutes at 30°C. 
The p-Gal read-out (L) was measured using a Luminometer (Trilux, Wallach). The value of 
(OD 6 oonmX L) was calculated and interacting preys having the highest values were selected. 
[0234] At this step of the protocol, diploid cell clones presenting interaction were 
isolated. The next step was now to identify polypeptides involved in the selected interactions. 
Example 3 : Identification of positive clones 

3. A. PCR on yeast colonies 
Introduction 

[023 5] PCR amplification of fragments of plasmid DNA directly on yeast colonies is a 
quick and efficient procedure to identify sequences cloned into this plasmid. It is directly 
derived from 

[0236] a published protocol (Wang H. et al., Analytical Biochemistry, 237, 145-146, 
(1996)). However, it is not a standardized protocol and it varies from strain to strain and it is 
dependent of experimental conditions (number of cells, Taq polymerase source, etc). This 
protocol should be optimized to specific local conditions. 
Materials 

[0237] For 1 well, PCR mix composition was : 
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32.5 nl water, 

5 ^1 1 0X PCR buffer (Pharmacia), 
1 jul dNTP 10 mM, 

0.5 jLti Taq polymerase (5u/ul) (Pharmacia), 

0.5 jlx( oligonucleotide ABS1 10 pmole/VI: 5'-GCGTTTGGAATCACTACAGG-3',(SEQ ID NO. 
424) 

0.5^1 oligonucleotide ABS2 10pmole/nl: 5'-CACGATGCACGTTGAAGTG-3'.(SEQ ID NO. 
425) 

1 N NaOH. 
Experiment 

[023 8] The positive colonies were grown overnight at 30°C on a 96 well cell culture 
cluster (Costar), containing 150^1 DO-Leu-Trp-His+Tetracyclin with shaking. The culture 
was resuspended and 100 \i\ was transferred immediately on a Thermowell 96 (Costar) and 
centrifuged for 5 minutes at 4,000 rpm at room temperature. The supernatant was removed. 
5 \x\ NaOH was added to each well and shaken for 1 minute. 

[0239] The Thermowell was placed in the thermocycler (GeneAmp 9700, Perkin Elmer) 
for 5 minutes at 99.9°C and then 10 minutes at 4°C. In each well, the PCR mix was added 
and shaken well. 

[024 0] The PCR program was set up as followed : 

94°C 3 minutes 

94°C 30 seconds 

53°C 1 minute 30 seconds x 35 cycles 

72°C 3 minutes 

72°C 5 minutes 

15°C oo 

[0241] The quality, the quantity and the length of the PCR fragment was checked on an 
agarose gel. The length of the cloned fragment was the estimated length of the PCR 
fragment minus 300 base pairs that corresponded to the amplified flanking plasmid 
sequences. 

[0242] 3.B. Plasmids rescue from yeast by electroporation 
Introduction 

[0243] The previous protocol of PCR on yeast cell may not be successful, in such a 
case, plasmids from yeast by electroporation can be rescued. This experiment allows the 
recovery of prey plasmids from yeast cells by transformation of E. coli with a yeast cellular 
extract. The prey plasmid can then be amplified and the cloned fragment can be sequenced. 
Materials 
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[0244] Plasmid rescue 

Glass beads 425-600 urn (Sigma)Phenol/chloroform (1/1) premixed with isoamyl alcohol 
(Amresco) 

Extraction buffer : 2% Triton X100, 1% SDS, 100 mM NaCI, 10 mM TrisHCI pH 8.0, 1 mM 
EDTA pH 8.0. 

Mix ethanol/NH 4 Ac : 6 volumes ethanol with 7.5 M NH 4 Acetate, 70% Ethanol and yeast cells 

in patches on plates. 

Electroporation 

SOC medium 

M9 medium 

Selective plates : M9-Leu+Ampicillin 

2 mm electroporation cuvettes (Eurogentech) 

Experiment 

Plasmid rescue 

[02 4 5] The cell patch on DO-Leu-Trp-His was prepared with the cell culture of section 
2.C. The cell of each patch was scraped into an Eppendorf tube, 300 jil of glass beads was 
added in each tube, then, 200 jlxI extraction buffer and 200 jil phenol:chloroform:isoamyl 
alcohol (25:24:1) was added. 

[0246] The tubes were centrifuged for 10 minutes at 15,000 rpm. 

[0247] 180 m-I supernatant was transferred to a sterile Eppendorf tube and 500 each of 

ethanol/NH 4 Ac was added and the tubes were vortexed. The tubes were centrifuged for 15 

minutes at 15,000 rpm at 4°C. The pellet was washed with 200 jxl 70% ethanol and the 

ethanol was removed and the pellet was dried. The pellet was resuspended in 10 water. 

Extracts were stored at -20°C. 

Electroporation 

Materials : 

[024 8] Electrocompetent MC1066 cells prepared according to standard protocols 
(Sambrook et al. supra). 

1 jlxI of yeast plasmid DNA-extract was added to a pre-chilled Eppendorf tube, and kept on 
ice. 

1 plasmid yeast DNA-extract sample was mixed and 20 |d electrocompetent cells was 
added and transferred in a cold electroporation cuvette.Set the Biorad electroporator on 200 
ohms resistance, 25 capacity; 2.5 kV. Place the cuvette in the cuvette holder and 
electroporate. 

1 ml of SOC was added into the cuvette and the cell-mix was transferred into a sterile 
Eppendorf tube. The cells were recovered for 30 minutes at 37°C, then spun down for 
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1 minute at 4,000 x g and the supernatant was poured off. About 100 ^ medium was kept 
and used to resuspend the cells and spread them on selective plates (e.g., M9-Leu plates). 
The plates were then incubated for 36 hours at 37°C. 

[024 9] One colony was grown and the plasmids were extracted. Check for the presence 
and size of the insert through enzymatic digestion and agarose gel electrophoresis. The 
insert was then sequenced. 
Example 4 : Protein-protein interaction 

[02 50] For each bait, the previous protocol leads to the identification of prey 
polynucleotide sequences. Using a suitable software program (e.g., Blastwun, available on 
the Internet site of the University of Washington 

y httD://bioweb.Dasteur.fr/seaanal/interfaces/blastwu.html ) the identity of the mRNA transcript 

that is encoded by the prey fragment may be determined and whether the fusion protein 

J encoded is in the same open reading frame of translation as the predicted protein or not. 

4J [0251] Alternatively, prey nucleotide sequences can be compared with one another and 

5 those which share identity over a significant region (60nt) can be grouped together to form a 

H contiguous sequence (Contig) whose identity can be ascertained in the same manner as for 

D individual prey fragments described above. 

[7 Example 5 : Identification of SID® 

[I [0252] By comparing and selecting the intersection of all isolated fragments that are 

P included in the same polypeptide, one can define the Selected Interacting Domain (SID®) as 

OJ 

illustrated in Figure 15. The SID® is illustrated in Table III . 
Example 6 : Identification of PIM® 

[0253] The PIM® is then constructed using methods known in the art as exemplified in 
Figure 16. 

Example 7 : Making of polyclonal and monoclonal antibodies 

[0254] The protein-protein complex of columns 1 and 3 of Table II was injected into mice 
and polyclonal and monoclonal antibodies were made following the procedure set forth in 
Sambrook et al. {supra). 

[02 55] More specifically, mice are immunized with an immunogen comprising Table II 
complexes conjugated to keyhole limpet hemocyanin using glutaraldehyde or EDC as is well 
known in the art. The complexes can also be stabilized by crosslinking as described in WO 
00/37483. The immunogen is then mixed with an adjuvant. Each mouse receives four 
injections of 10 ug to 100 ug of immunogen, and after the fourth injection, blood samples are 
taken from the mice to determine if the serum contains antibodies to the immunogen. Serum 
titer is determined by ELISA or RIA. Mice with sera indicating the presence of antibody to 
the immunogen are selected for hybridoma production. 
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[0256] Spleens are removed from immune mice and single-cell suspension is prepared 
(Harlow et al 1988). Cell fusions are performed essentially as described by Kohler et al 
(1976). Briefly, P365.3 myeloma cells (ATTC Rockville, Md) or NS-1 myeloma cells are 
fused with spleen cells using polyethylene glycol as described by Harlow et al (1989). Cells 
are plated at a density of 2 x 10 5 cells/well in 96-well tissue culture plates. Individual wells 
are examined for growth and the supernatants of wells with growth are tested for the 
presence of the complex-specific antibodies by ELISA or RIA using one of the proteins set 
forth in Table II as a target protein. Cells in positive wells are expanded and subcloned to 
establish and confirm monoclonality. 

[02 57] Clones with the desired specificities are expanded and grown as ascites in mice 
or in a hollow fiber system to produce sufficient quantities of antibodies for characterization 
and assay development. Antibodies are tested for binding to one of the proteins in Table II, 
to determine which are specific for the Table II complexes as opposed to those that bind to 
the individual proteins. More specifically, antibodies are tested for binding to bait polypeptide 
of column 1 of Table II alone or to prey polypeptide of column 3 of Table II alone, to 
determine which are specific for the protein-protein complex of columns 1 and 3 of Table II 
as opposed to those that bind to the individual proteins. 

[0258] Monoclonal antibodies against each of the complexes set forth in columns 1 and 
3 of Table II are prepared in a similar manner by mixing specified proteins together, 
immunizing an animal, fusing spleen cells with myeloma cells and isolating clones which 
produce antibodies specific for he protein complex, but not for individual proteins. 
Example 8: Modulating compounds/PIM screening 

[0259] Each specific protein-protein complex of columns 1 and 3 of Table II may be used 
to screen for modulating compounds. 

[0260] One appropriate construction for this modulating compound screening may be: 

- bait polynucleotide inserted in pB6 or pB20;- prey polynucleotide inserted in 

pP6; 

- transformation of these two vectors in a permeable yeast cell; 

- growth of the transformed yeast cell on medium containing compound to be tested; 

- and observation of the growth of the yeast cells. 

[0261] The following results obtained from these Examples, as well as the teachings in 
the specification are set forth in the Tables below. 

[0262] While the invention has been described in terms of the various preferred 
embodiments, the skilled artisan will appreciate that various modifications, substitutions, 
omissions and changes may be made without departing from the scope thereof. 
Accordingly, it is intended that the present invention be limited by the scope of the following 
claims, including equivalents thereof. 
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[0263] All patent and non-patent publications cited in this specification, including 
the websites set forth onpages 8, 13 and 33, are indicative of the level of skill of 
those skilled in the art to which this invention pertains. All these publications and 
patent applications are herein incorporated by reference to the same extent as if 
each individual publication or patent application was specifically and individually 
indicated to be incorporated herein by reference. 
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FLSKLKKMFTS* 
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RELDALLEQQNTIESKMVTL 

HRMGPNLQLIEGDAKQLAG 

MITFTCNLAENVSSKVRQL 

DLAKNRLYQAIQRADDILDL 

KFCMDGVQTALRSEDYEQ 

AAAHIHRYLCLDKSVIELSR 

QGKGGSMIDANLKLLQEAE 

QRLKAIVAEKFAIATKEGDL 

PQVERFFKIFPLLGLHEEGL 

RRFSEYLCKQVASKAEENL 

LMVLGTDMSDRRAAVIFAD 

TLTLLFEGIARIVEAHQPIVE 

TYYGPGRLYTLIKYLQVEC 

DRQVEKVVDKFIKQRDYHQ 

QFRHVQNNLMRNSTTEKIE 

PRELDPILTEVTLMNARSEL 
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TNNGGCNTNGGNTNGGGNGCNTNTGTNTNGNCNNNGTTGTTGNGNNNANG 
GCGNGNNCGGNGNCNGTTGATTNNCAGGTNTGGNNNNGNTGGNGGCNCNT 
GGCNCCTNGCATNTN 


GTTTGATCAGCCTCAGGAATACTTCATGGAGTTGACATTCAATCAAGCTGCAA 

AGGGGGTCAACAAGGAGTTCACCGTGAACATCATGGACACGTGTGAGCGCT 

GCAACGGCAAGGGGAACGAGCCCGGCACCAAGGTGCAGCATTGCCACTAC 

TGTGGCGGCTCCGGCATGGAAACCATCAACACAGGCCCTTTTGTGATGCGT 

TCCACGTGTAGGAGATGTGGTGGCCGCGGCTCCATCATCATATCGCCCTGT 

GTGGTCTGCAGGGGAGCAGGACAAGCCAAGCAGAAAAAGCGAGTGATGATC 

CCTGTGCCTGCAGGAGTCGAGGATGGCCAGACCGTGAGGATGCCTGTGGG 

AAAAAGGGAAATTTTCATTACGTTCAGGGTGCAGAAAAGCCCTGTGTTCCGG 

AGGGACGGCGCAGACATCCACTCCGACCTCTTTATTTCTATAGCTCAGGCTC 

TTCTTGGGGGAACAGCCAGAGCCCAGGGCCTGTACGAGACGATCAACGTGA 

CGATCCCCCCTGGGACTCAGACAGACCAGAAGATTCGGATGGGTGGGAAAG 

GCATCCCCCGGATTAACAGCTACGGCTACGGAGACCACTACATCCACATCAA 

GATACGAGTTCCAAAGAGGCTAACGAGCCGGCAGCAGAGCCTGATCCTGAG 

CTACGCCGAGGACGAGACAGATGTGGAGGGGACGGTGAACGGCGTCACCC 

TCACCAGCTCTGGTGGCAGCACCATGGATAGCTCCGCAGGAAGCAAGGCTA 

GGCGTGAGGCTGGGGAGGACGAGGAGGGATTCCTTTCCAAACTTAAGAAAA 

TGTTTACCTCATGA 


ATGGCGGACCTTGATTCGCCTCCGAAGCTGTCAGGGGTGCAGCAGCCGTCT 

GAGGGGGTGGGAGGTGGCCGCTGCTCCGAAATCTCCGCTGAGCTCATTCG 

CTCCCTGACAGAGCTGCAGGAGCTGGAGGCTGTATACGAACGGCTCTGCGG 

CGAGGAGAAAGTGGTGGAGAGAGAGCTGGATGCTCTTTTGGAACAGCAAAA 

C ACC ATTG AAAGTAAG ATG GTC ACTCTCC ACCG AATG GGTCCTAATCTG C AG 

CTGATTGAGGGAGATGCAAAGCAGCTGGCTGGAATGATCACCTTTACCTGCA 

ACCTGGCTGAGAATGTGTCCAGCAAAGTTCGTCAGCTTGACCTGGCCAAGAA 

CCGCCTCTATCAGGCCATTCAGAGAGCTGATGACATCTTGGACCTGAAGTTC 

TGCATGGATGGAGTTCAGACTGCTTTGAGGAGTGAAGATTATGAGCAGGCTG 

CAGCACATATTCATCGCTACTTGTGCCTGGACAAGTCGGTCATTGAGCTCAG 

CCGACAGGGCAAAGGGGGGAGCATGATTGATGCCAACCTGAAATTGCTGCA 

GGAAGCTGAGCAACGTCTCAAAGCCATTGTGGCAGAGAAGTTTGCCATTGC 

CACCAAGGAAGGTGATTTGCCCCAGGTGGAGCGCTTCTTCAAGATCTTCCCA 

CTGCTGGGTTTGCATGAGGAGGGATTAAGAAGGTTCTCGGAGTACCTTTGCA 

AGCAGGTGGCCAGTAAAGCTGAGGAGAATCTGCTCATGGTGCTGGGGACAG 

ACATGAGTGATCGGAGAGCTGCAGTCATCTTTGCAGATACACTTACTCTTCT 

GTTTGAAGGGATTGCCCGCATTGTGGAGGCCCACCAGCCAATAGTGGAGAC 

CTATTATGGGCCAGGGAGACTCTATACCCTGATCAAATATCTGCAGGTGGAA 

TGTGACAGACAGGTGGAGAAGGTGGTAGACAAGTTCATCAAGCAAAGGGAC 
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MGIGLSAQGVNMNRLPGW 

DKHSYGYHGDDGHSFCSS 

GTGQPYGPTFTTGDVIGCC 

VNLINNTCFYTKNGHSLGIA 

FTDLPPNLYPTVGLQTPGE 

VVDANFGQHPFVFDIEDYM 

REWRTKIQAQIDRFPIGDR 

EGEWQTMIQKMVSSYLVH 

HGYCATAEAFARSTDQTVL 

EELASIKNRQRIQKLVLAGR 

MGEAIETTQQLYPSLLERN 

PNLLFTLKVRQFIEMVNGT 

DSEVRCLGGRSPKSQDSY 

PVSPRPFSSPSMSPSHGM 

NIHNLASGKGSTAHFSGFE 

SCSNGVISNKAHQSYCHSN 

KHQSSNLNVPELNSINMSR 

SQQVNNFTSNDVDMETDH 

YSNGVGETSSNGFLNGSS 

KHDHEMEDCDTEMEVDSS 

QLRRQLCGGSQAAIERMIH 

FGRELQAMSEQLRRDCGK 

NTANKKMLKDAFSLLAYSD 

PWNSPVGNQLDPIQREPV 

CSALNSAILETHNLPKQPPL 

ALAMGQATQCLGLMARSGI 

GSCAFATVEDYLH* 


lEIHGKAGLFLEGQIHPELE 
GVEIVISEKGASSPLITVFTD 
DKGAYSVGPLHSDLEYTVT 
SQKEGYVLTAVEGTIGDFK 
AYALAGVSFEIKAEDDQPL 
PGVLLSLSGGLFRSNLLTQ 
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AGCTGTGTGTCCCCAGGCTCACTATAACATGTACCCACAGAACCCTGACTCA 
GTCCTTGACACCGATGGGGACTTCGATCTGGAGGACACAATGGACGTAGCG 
CGGCGTGTGGAGGAGCTCCTGGGCCGGCCAATGGACAGTCAGTGGATCCC 
GCACGCACAATCGTGA 


ATGGGAATTGGTCTTTCTGCTCAAGGTGTGAACATGAATAGACTACCAGGTT 

GGGATAAGCATTCATATGGTTACCATGGGGATGATGGACATTCGTTTTGTTCT 

TCTGGAACTGGACAACCTTATGGACCAACTTTCACTACTGGTGATGTCATTG 

GCTGTTGTGTTAATCTTATCAACAATACCTGCTTTTACACCAAGAATGGACAT 

AGTTTAGGTATTGCTTTCACTGACCTACCGCCAAATTTGTATCCTACTGTGGG 

GCTTCAAACACCAGGAGAAGTGGTCGATGCCAATTTTGGGCAACATCCTTTC 

GTGTTTGATATAGAAGACTATATGCGGGAGTGGAGAACCAAAATCCAGGCAC 

AGATAGATCGATTTCCTATCGGAGATCGAGAAGGAGAATGGCAGACCATGAT 

ACAAAAAATGGTTTCATCTTATTTAGTCCACCATGGGTACTGTGCCACAGCAG 

AGGCCTTTGCCAGATCTACAGACCAGACCGTTCTAGAAGAATTAGCTTCCAT 

TAAGAATAGACAAAGAATTCAGAAATTGGTATTAGCAGGAAGAATGGGAGAA 

GCCATTGAAACAACACAACAGTTATACCCAAGTTTACTTGAAAGAAATCCTAA 

TCTCCTTTTCACATTAAAAGTGCGTCAGTTTATAGAAATGGTGAATGGTACAG 

ATAGTGAAGTACGATGTTTGGGAGGCCGAAGTCCAAAGTCTCAAGACAGTTA 

TCCTGTTAGTCCTCGACCTTTTAGTAGTCCAAGTATGAGCCCCAGCCATGGA 

ATGAATATCCACAATTTAGCATCAGGCAAAGGAAGCACCGCACATTTTTCAG 

GTTTTGAAAGTTGTAGTAATGGTGTAATATCAAATAAAGCACATCAATCATATT 

GCCATAGTAATAAACACCAGTCATCCAACTTGAATGTACCAGAACTAAACAGT 

ATAAATATGTCAAGATCACAGCAAGTTAATAACTTCACCAGTAATGATGTAGA 

CATGGAAACAGATCACTACTCCAATGGAGTTGGAGAAACTTCATCCAATGGT 

TTCCTAAATGGTAGCTCTAAACATGACCACGAAATGGAAGATTGTGACACCG 

AAATGGAAGTTGATTCAAGTCAGTTGAGACGCCAGTTGTGTGGAGGAAGTCA 

GGCCGCCATAGAAAGAATGATCCACTTTGGACGAGAGCTGCAAGCAATGAG 

TGAACAGCTAAGGAGAGACTGTGGCAAGAACACTGCAAACAAAAAAATGTTG 

AAGGATGCATTCAGTCTACTAGCATATTCAGATCCCTGGAACAGCCCAGTTG 

GAAATCAGCTTGACCCGATTCAGAGAGAACCTGTGTGCTCAGCTCTTAACAG 

TGCAATATTAGAAACCCACAATCTGCCAAAGCAACCTCCACTTGCCCTAGCA 

ATGGGACAGGCCACACAATGTCTAGGACTGATGGCTCGATCAGGAATTGGA 

TCCTGCGCATTTGCCACAGTGGAAGACTACCTACATTAG 


GATCGAGATCCATGGGAAGGCAGGCCTGTTTTTAGAAGGCCAGATCCACCC 
CGAGTTGGAAGGAGTCGAGATTGTCATCAGTGAAAAGGGGGCAAGTTCACC 
GCTGATCACAGTCTTTACTGATGACAAAGGTGCCTACAGTGTTGGCCCCCTG 
CACAGTGACCTGGAGTACACGGTGACCTCACAGAAGGAGGGCTATGTTCTG 
ACTGCGGTGGAAGGAACCATCGGAGACTTCAAGGCCTATGCCCTGGCAGGC 
GTAAGCTTTGAGATAAAAGCTGAGGATGACCAGCCCCTCCCGGGAGTCCTC 
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DNGILTFSNLSPGQYYFKP 

MMKEFRFEPSSQMIEVQE 

GQNLKITITGYRTAYSCYGT 

VSSLNGEPEQGVAMEAVG 

QNDCSIYGEDTVTDEEGKF 

RLRGLLPGCVYHVQLKAEG 

NDHIERALPHHRVIEVGNN 

DIDDVNIIVFRQINQFDLSG 

NVITSSEYLPTLWVKLYKSE 

NLDNPIQTVSLGQSLFFHFP 

PLLRDG EN YVVLLDSTLPR 

oU Y U Y ILrVJ Vor 1 MVLa Y nr\ 

HTTLIFNPTRKLPEQDIAQG 
SYIALPLTLLVLLAGYNHDK 
LIPLLLQLTSRLQGVRALGQ 
AASDNSGPEDAKRQAKKQ 
KTRRT* 


LGLHSPIALDVLSEAFEESL 

VARDWSRALQLTEVYGRD 

VDDLSSIKDAVLSCAVAYD 

KEGWQYLFPVKDASLRSRL 

ALQFVDRW PLESCLEILAY 

CISDTAVQEGLKCELQRKL 

AELQVYQKILGLQSPPVWC 

DWQTLRSCCVEDPSTVMN 

MILEAQEYELCEEWGCLYP 

IPREHLISLHQKHLLHLLER 

RDHDKALQLLRRIPDPTMC 

LEVTEQSLDQHTSLATSHF 

LANYLTTHFYGQLTAVRHR 

EIQALYVGSKILLTLPEQHR 

ASYSHLSSNPLFMLEQLLM 

NMKVDWATVAVQTLQQLL 

VGQEIGFTMDEVDSLLSRY 

AEKALDFPYPQREKRSDSV 

IHLQEIVHQAADPETLPRSP 
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NFHLPREVYVFF ALFYVFT 
SLSHTHTRIHTHSLFLIK*DY 
TTHILSLAFLLKSISKRILCVS 
EAGATSFF*LAAW RSI ECLS 
SCVPSGWIVCLFLVX 


NPVPLYAPNLSPPADSRIH 

VPASGYCCLECGDAFALEK 

SLSQHYGRRSVHIEVLCTL 

CSKTLLFFNKCSLLRHARD 

HKSKGLVMQCSQLLVKPIS 

ADQM FVSAPVNSTAPAAPA 

PSSSPKHGLTSGSASPPPP 

ALPLYPDPVRLIRYSIKCLE 

CHKQMRDYMVLAAHFQRT 

TEETEGLTCQVCQMLLPNQ 

CSFCAHQRIHAHKSPYCCP 

ECGVLCRSAYFQTHVKEN 

CLHYARKVGYRCIHCGVVH 

LTLALLKSHIQERHCQVFHK 

CAFCPMAFKTASSTADHSA 

TQHPTQPHRPSQLIYKCSC 

EMVFNKKRHIQQHFYQNVS 

KTQVGVFKCPECPLLFVQK 
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GCTGCCATGACCTACTGCAGAGCTGCCCGCCAGTTGGTGGAGAAAGAGAAG 
TACAGTGAGATCCAGCAACTGCTCAAATGTGTCAGTGAGTCAGGCATGGCAG 
CCAAAAGTGACGGGGACACCATCCTCCTCAACTGCCTGGAAGCGTTCAAGA 
GAATTCCGCCCCAGGAGCTGGAGGGCCTGATCCAGGCAATACACAATGATG 
ACAACAAGGTTCGGGCCTACCTGATATGTTGCAAACTGCGTTCTGCCTACTT 
GATTGCTGTGAAGCAAGAACACTCACGGGCCACAGCCCTTGTCCAGCAGGT 
GCAGCAGGCCGCCAAGAGCAGCGGGGATGCAGTAGTGCAAGACATCTGTG 
CCCAGTGGCTTCTGACAAGCCACCCCCGGGGTGCCCATGGCCCAGGCTCC 
AGGAAGTGA 


CTCCCTCTCTGCCTAGCTGGCTTTCTGTAAATAATTATTTGTGTCATAGCT1 A 
CAGCTTTTTAAACATTTTCACTTTTATTATTTCATTTAATTTTCACACCAGCCCC 
G A A AAGTGTTTTTTCC ACTTT AC A AATT A AG ATG C AG AAG CTC AG C A AT AN N N 
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGGGAAGCTCAGCAA 
T ATT AA ATGTCTG G G CC A ATT ACGT AATC AGT A AG C 


AATTTCCACCTCCCAAGGGAAGTTTATGTATTTTTCTAGGCCCTTTTCTATGT 

CTTTACATCTCTGTCTCACACACACACACGTATACACACACACAGTTTATTTTT 

AATAAAATAGGATTATACCACACACATCCTGTCACTTGCTTTTTTGCTTAAGA 

GTATATCTAAGAGAATCCTTTGTGTCAGTGAAGCTGGAGCTACCTCATTCTTT 

TAACTGGCTGCGTGGCGTTCCATTGAGTGTCTGTCATCATGTGTTTAGCCGA 

GTGGATGGATAGTCTGCTTGTTTTTAGTTTNTGC 


CAACCCCGTGCCCCTCTATGCGCCAAATCTCAGCCCGCCTGCGGACAGCAG 

GATCCACGTGCCGGCCAGTGGGTACTGCTGCCTGGAGTGTGGAGACGCATT 

TGCCTTAGAGAAGAGCCTGAGCCAGCACTATGGCCGGCGGAGCGTCCACAT 

TGAGGTACTGTGCACACTGTGCTCCAAGACGCTGCTCTTCTTCAACAAGTGC 

AGCCTGCTCCGGCACGCCCGTGACCACAAGAGCAAGGGGCTCGTCATGCA 

GTGTTCCCAGCTGCTGGTGAAGCCTATCTCTGCGGACCAAATGTTCGTGTCG 

GCCCCTGTGAACTCCACGGCACCAGCAGCCCCAGCCCCTTCATCCTCTCCC 

AAACATGGCCTCACTTCGGGCAGTGCCAGTCCCCCTCCTCCAGCCTTGCCA 

CTCTACCCAGACCCTGTGAGGCTCATCCGGTACTCAATCAAGTGTCTTGAAT 

GTCACAAGCAGATGCGGGACTACATGGTCCTGGCTGCACATTTCCAGAGGA 

CAACAGAGGAGACAGAGGGGCTGACCTGCCAGGTATGCCAGATGCTGCTGC 

CCAACCAGTGCAGTTTCTGTGCCCACCAGCGGATTCATGCACACAAGTCCCC 

CTACTGCTGCCCGGAGTGTGGGGTCCTCTGCCGCTCTGCCTACTTCCAGAC 

CCATGTAAAGGAGAATTGCCTGCACTATGCCCGCAAGGTGGGCTACAGGTG 

CATCCACTGTGGTGTCGTCCACCTGACCTTGGCCTTGCTGAAAAGCCACATC 

CAGGAGCGACACTGCCAGGTTTTCCACAAATGTGCATTCTGCCCCATGGCCT 

TCAAGACTGCCAGCAGCACTGCAGACCACAGTGCCACCCAGCACCCCACCC 

AGCCCCACAGACCCTCCCAGCTCATTTATAAGTGCTCCTGTGAAATGGTCTT 
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XXYX 


AQAVIPYQAVKIYSLVFFXK 
IKSVIHFGLIFL*CILLFEVNF 
FLVLKFSSSCSSXXC*ENCX 
SH*XYFGYLX*XXYXXYXVX 
XNXTLRXVAXRRKXXXXXX 
RE 


RVGMGWASVRPSDPPHVC 

CPKPRRSLVWYSVSGLG** 

LDTRLNLGLQFPTFRLLWV 

CPGVSN*PGSQGCRLFPP 

GWGAACTCQGSFAGLFIXI 

CFR 


PPPPTHVHTVSAQCLLFFF 
KXXFXXXXXXXXKXXXXXX 
FXXFFXFXFFX'XXXFXKRX 
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QKDDKEPQPVKKTVTGTD 

ADLRRLSLKNAKQLLRKFG 
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MSTEQARSGEGPMSKFAR 
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RCETVRKPAVIDAYVRIRTT 

KDEEFIRKFALFDEQHREE 

MRKERRRIQEQLRRLKRN 

QEKEKLKGPPEKKPKKMKE 

RPDLKLKCGACGAIGHMRT 
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TACATATTTGTTGAATGGCTTTGACANACTTGCTGATAGTGATATGAACATTA 
NNGTCCAAGCTGAGGTGGTCTCAAATGGAGATGAGGAACTTGTTGGGAACT 
GAAGNACAGGTGACTCTTGTTATGTTTTANCCAAGACCACTGTCNTCATTTTG 
CCTNTGCCCTANANATTTNTGGAACTTTNACNTTGAGANANATGATNCANGAT 
CTTGGNNGANGANNTNNNTAANNGNNNTATATTNN 


GCACAAGCCGTCATACCATACCAGGCAGTAAAAATTTACTCCTTAGTTTTU I I 

CTANAAATAGATTAAGTCTGTGATCCATTTTGGGTTAATTTTTCTGTGATGTAT 

ACTATTGTTTGAGGTTAATTTTTTTCTAGTTTTAAAATTTTCATCCAGTTGTTCC 

AGCNTCNCTTGTTGAGAAAATTGTTNTTCCCATTAANATTACTTTGGATACCT 

NGNGTGANGNNTATATGNGGNCTATANNGTGTNGNGNAACNCGACGCTGCG 

CAGNGTGGCNTANCGTCGTAAGNNANGTAGNGNANAGNGCCGNGAGA 


AGAGTGGGGATGGGCTGGGCCTCTGTTCGTCCGTCCGACCCCCCTCATGTG 
TGCTGCCCCAAACCTCGCCGCTCCCTAGTTTGGTATTCTGTGTCCGGCCTGG 
GGTAGTAGCTGGACACCAGACTCAATCTTGGGCTCCAGTTCCCGACTTTTCG 
CCTCCTCTGGGTCTGTCCTGGGGTCAGTAATTAACCCGGGTCCCAGGGGTG 
TCGTCTTTTCCCTCCAGGGTGGGGCGCTGCCTGTACATGCCAGGGATCTTTT 
G C AG G G CTTTTC ATCC AN ATTTG CTTC AG G G 


CCTCCTCCTCCAACACACGTGCACACAGTGTCTGCCCAATGCCTACTTTTTTT 
TTTTAAANGAAANTTTNANTTNGNAANTANAANNNGGNTAAAANGNCNTNNNC 
NTNTANCCTTTTNNNGTTTTTTTTNNTTTTNTTTTTTTNGNTAANNNANNNGTT 
TTTNAAAAAGGTNNAAAAAAATNTTNACANTTTTNGGGGNTAANCTTTTAATTT 
AAAACTTNGNCCCCTTAAATTANCCACCNCAANNTANCAAATTTTNAAGGTTT 
TNAAAAAANNGTTTGGGA 


AGCAGAAGGATGATAAAGAACCGCAGCCAGTGAAGAAGACAGTGACAGGAA 
CAGATGCAGACCTTCGTCGCCTTTCCCTGAAAAATGCCAAGCAACTTCTACG 
TAAATTTGGTGTGCCTGAGGAAGAGATTAAAAAGTTGTCCCGCTGGGAAGTG 
ATTGATGTGGTGCGCACAATGTCAACAGAACAGGCTCGTTCTGGAGAGGGG 
CCCATGAGTAAATTTGCCCGTGGATCAAGGTTTTCTGTGGCTGAGCATCAAG 
AGCGTTACAAAGAGGAATGTCAGCGCATCTTTGACCTACAGAACAAGGTTCT 
GTCATCAACTGAAGTCTTATCAACTGACACAGACAGCAGCTCAGCTGAAGAT 
AGTGACTTTGAAGAAATGGGAAAGAACATTGAGAACATGTTGCAGAACAAGA 
AAACCAGCTCTCAGCTTTCACGTGAACGGGAGGAACAGGAGCGGAAGGAAC 
TACAGCGAATGCTACTGGCAGCAGGCTCAGCAGCATCCGGAAACAATCACA 
GAGATGATGACACAGCTTCCGTGACTAGCCTTAACTCTTCTGCCACTGGACG 
CTGTCTCAAGATTTATCGCACGTTTCGAGATGAAGAGGGGAAAGAGTATGTT 
CGCTGTGAGACAGTCCGAAAACCAGCTGTCATTGATGCCTATGTGCGCATAC 
GGACTACAAAAGATGAGGAATTCATTCGAAAATTTGCCCTTTTTGATGAACAA 
CATCGGGAAGAGATGCGAAAAGAACGGCGGAGGATTCAAGAGCAACTGAGG 
CGGCTTAAGAGGAACCAGGAAAAGGAGAAGCTTAAGGGTCCTCCTGAGAAG 
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VTKSSPRALAARERKRSRG 
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GCAGAAGTGGGCATCTCGGAGCCAGCCTGGAAGCCTCCAGCAGGGCTATTC 
CCCCACACCCCTACCCACTGGAAAGGAGTGTTGTAAGACTAGGTTTTGGCTA 
A 


NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTCACT 

ATGTATCTTCTTTTAAATGTAAGTTTTGTGTTTTATAATTTTTCACATCTACTGA 

ATTAAATCTGAACAGTGACTTTGTGCAAAATAAATTTTGCTGTCCATTCTTGCC 

AAAAAGTCCTGAATGTCCAGGATGATTTCTCCAGGACATCTCTATTGCTCCCA 

AGTTTCAAACAGTTTTTTGGGAGCCAAAACCTCAGGATTTACCCTANATCTGG 

TTAACATTTTGAAAANATACANG 


GGACCCTGTCTCAGTGGACACGGCCCGACTGGAACACCTCTTTGAGTC1 CG 

TGCCAAAGAGGTGCTGCCCTCCAAGAAAGCTGGAGAGGGCCGCCGGACAAT 

GACCACAGTGCTGGACCCCAAGCGCACGAACGCCATCAACATCGGCCTAAC 

CACACTGCCACCTGTGCATGTCATTAAGGCTGCTCTGCTCAACTTTGATGAG 

TTTGCTGTCAGCAAGGATGGCATTGAGAAGCTACTGACCATGATGCCCACGG 

AGGAAGAGCGGCAGAAGATTGAGGGAGCCCAGCTGGCCAACCCTGACATAC 

CCCTGGGCCCAGCCGAGAACTTCCTGATGACTCTTGCCTCCATTGGCGGCC 

TCGCTGCTCGTCTACAACTCTGGGCCTTCAAGCTGGACTATGACAGCATGGA 

GCGGGAAATTGCTGAGCCACTGTTTGACCTGAAAGTGGGTATGGAACAGCT 

GGTACAGAATGCCACCTTCCGCTGCATCCTGGCTACCCTCCTAGCTGTGGG 

C A ACTTCCTC A ATG G CTCCC AG AG C AG CG G CTTTG AG CTG AG CT ACCTG G A 

GAAGGTGTCAGATGTGAAGGACACGGTGCGTCGACAGTCACTGCTACACCA 

TCTCTGCTCCCTAGTGCTCCAGACCCGGCCTGAGTCCTCTGACCTCTATTCA 

GAAATCCCTGCCCTGACCCGCTGTGCCAAGGTGGACTTTGAACAGCTGACT 

GAGAACCTGGGGCAGCTGGAGCGCCGGAGCCGGGCAGCCGAGGAAAGCC 

TGCGGAGCTTGGCCAAGCATGAGCTGGCCCCAGCCCTGCGTGCCCGCCTC 

ACCCACTTCCTGGACCAGTGTGCCCGCCGTGTTGCCATGCTAAGGATAGTG 

CACCGCCGTGTCTGCAATAGGTTCCATGCCTTCCTGCTCTACCTGGGCTACA 

CCCCGCAGGCGGCCCGTGAAGTGCGCATCATGCAGTTCTGCCACACGCTGC 

GGGAATTTGCGCTTGAGTATCGGACTTGCCGGGAACGAGTGCTACAGCAGC 

AGCAGAAGCAGGCCACATACCGTGAGCGCAACAAGACCCGGGGACGCATG 

ATCACCGAGACAGAGAAGTTCTCAGGTGTGGCTGGGGAAGCCCCCAGCAAC 

CCCTCTGTCCCAGTAGCAGTGAGCAGCGGGCCAGGCCGGGGAGATGCTGA 

CAGTCATGCTAGTATGAAGAGTCTGCTGACCAGCAGGCTTGAGGACACCAC 

ACACAATCGCCGCAGCAGAGGCATGGTCCAGAGCAGCTCCCCAATCATGCC 

CACAGTGGGGCCCTCCACTGCATCCCCAGAAGAACCCCCAGGCTCCAGTTT 

ACCCAGTGATACATCAGATGAGATCATGGACCTTCTGGTGCAGTCAGTGACC 

AAGAGCAGTCCTCGTGCCTTAGCTGCTAGGGAACGCAAGCGTTCCCGCGGC 

AACCGCAAGTCTTTGAGAAGGACGTTGAAGAGTGGGCTCGGAGATGACCTG 

GTGCAGGCACTGGGACTAAGCAAGGGTCCTGGCCTGGAGGTGTGA 
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GCAGGAAGCTCAGAGTATCGATGAAATCTACAAATACGACAAGAAACAGCAG 

CAAGAAATCCTGGCGGCGAAGCCCTGGACTAAGGATCACCATTACTTTAAGT 

ACTGCAAAATCTCAGCATTGGCTCTGCTGAAGATGGTGATGCATGCCAGATC 

GGGAGGCAACTTGGAAGTGATGGGTCTGATGCTAGGAAAGGTGGATGGTGA 

AACCATGATCATTATGGACAGTTTTGCTTTGCCTGTGGAGGGCACTGAAACC 

CGAGTAAATGCTCAGGCTGCTGCATATGAATACATGGCTGCATACATAGAAA 

ATGCAAAACAGGTTGGCCGCCTTGAAAATGCAATCGGGTGGTATCATAGCCA 

CCCTGGCTATGGCTGCTGGCTTTCTGGGATTGATGTTAGTACTCAGATGCTC 

AATCAGCAGTTCCAGGAACCATTTGTAGCAGTGGTGATTGATCCAACAAGAA 

CAATATCCGCAGGGAAAGTGAATCTTGGCGCCTTTAGGACATACCCAAAGGG 

CTACAAACCTCCTGATGAAGGACCTTCTGAGTACCAGACTATTCCACTTAATA 

AAATAGAAGATTTTGGTGTACACTGCAAACAATATTATGCCTTAGAAGTCTCA 

TATTTCAAATCCTCTTTGGATCGCAAATTGCTTGAGCTGTTGTGGAATAAATA 

CTGGGTGAATACGTTGAGTTCTTCTAGCTTGCTTACTAATGCAGACTATACCA 

CTGGTCAGGTCTTTGATTTGTCTGAAAAGTTAGAGCAGTCAGAAGCCCAGCT 

GGGACGAGGGAGTTTCATGTTGGGTTTAGAAACGCATGACCGAAAATCAGAA 

G AC AA ACTTG CC A A AGCT AC AAG AG AC AG CTGT A AA ACTACC AT AG AAG CT A 

TCCATGGATTGATGTCTCAGGTTATTAAGGATAAACTGTTTAATCAAATTAACA 

Tfrrr.TTAA 


TTG G G G C ATCTTG G C AGG AG CTTTG G ATTTCTTT AG G G AA ATG G C A A 1 UAUA 
TG G GG CAG AGTGTTTTTTGCTG AGG G AATC AG AATG ATCCCTC AAAC AG C AC 
CTTTGATCTCTATTCTCTGCTAAAGATGGTGCTTCCTCTACTTCCCCAGACCC 
CCGTGTCTGTTCCATTTCCATGAATTTTTCATCAGGGTCACAGGACAAAGGTT 
TTAGTCTTTGGTTCTAATGAGACCTCTGACTTGGCTCTGGATGACTATGAAAC 
TAGTGAATGCATTTGTCTTTTCTGGAATCCN 


GCT AATATGGT AGCT ATTG ATAGCTTACTATGTATCAG ATCCN N NNNIMNNNN 
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGAGTA 
G CT AG G ACT AC AGTG GTG AG CC ACC ATG CCC AG CT A ATTTTTTTTTTTTTTT N 
NNNAAAAAGGGNNTTNNTTNTTNTNGCCCNGGNNGGTNTNAANCTCNTNNC 
r.TNANGGNATTNNCCCNCCTNGNCCNCCAAANGGGCNGGANTT 


GGGCGAGAGGACTGAGTGTGCTGAGCCCCCCCGGGACGAACCCCUGGU I G 
ATGGAGCTCTGAAGCGGGCAGAGGAGCTCAAGACTCAGGCCAATGACTACT 
TCAAAGCCAAGGACTACGAGAACGCCATCAAGTTCTACAGCCAGGCCATCG 
AGCTGAACCCCAGCAATGCCATCTACTATGGCAACCGCAGCCTGGCCTACC 
TGCGCACTGAGTGCTATGGCTACGCGCTGGGAGACGCCACGCGGGCCATT 
GAGCTGGACAAGAAGTACATCAAGGGTTATTACCGCCGGGCTGCCAGCAAC 
ATGGCACTGGGCAAGTTCCGGGCCGCGCTGCGAGACTACGAGACGGTGGT 
CAAGGTGAAGCCCCATGACAAGGATGCCAAAATGAAATACCAGGAGTGCAA 
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GAACTCTATTCCCAGCTGAAAGCCAAGGAAGAGACTTATAATCAACTACTTGA 

CAAGGGCAGACTCATGCTTCTAAGCCGTGACGACTCTGGGTCTGGCTCCAA 

GACAGAACAGAGTGTAGCACTTTTGGAGCAGAAGTGGCATGTGGTCAGCAG 

TAAGATGGAAGAAAGAAAGTCAAAGCTGGAAGAGGCCCTCAACTTGGCAACA 

GAATTCCAGAATTCCCTACAAGAATTTATCAACTGGCTCACTCTAGCAGAGCA 

GAGTTTAAACATCGCTTCTCCACCAAGCCTGATTCTAAATACTGTCCTTTCCC 

AGATAGAAGAGCACAAGGTTTTTGCTAATGAAGTAAATGCTCATCGAGACCA 

GATCATTGAGCTGGATCAAACTGGGAATCAATTAAAGTTCCTTAGCCAAAAG 

CAGGATGTTGTTCTGATCAAGAATTTGTTGGTGAGCGTGCAGTCTCGATGGG 

AGAAGGTTGTCCAGCGATCTATTGAAAGAGGGCGATCACTAGATGATGCCAG 

G A AG CG G G C A AA AC A ATTCC ATG A AG CTTG G A A A A A ACTG ATTG ACTGG CT A 

GAAGATGCAGAGAGTCACCTGGACTCAGAACTAGAGATATCCAATGACCCAG 

ACAAAATTAAACTTCAGCTTTCTAAGCATAAGGAGTTTCAGAAGACTCTTGGT 

GGCAAGCAGCCTGTGTATGATACCACAATTAGAACTGGCAGAGCACTGAAAG 

AAAAGACTTTGCTTCCCGAAGATACTCAGAAACTTGACAATTTCCTAGGAGAA 

GTCAGAGACAAATGGGATACTGTTTGTGGCAAGTCTGTGGAGCGGCAGCAC 

AAGTTGGAGGAAGCCCTGCTCTTTTCGGGTCAGTTCATGGATGCTTTGCAGG 

CATTGGTTGACTGGTTATACAAGGTGGAGCCACAGCTGGCTGAGGACCAGC 

CCGTGCACGGGGACCTTGACCTCGTCATGAACCTCATGGATGCACACAAGG 

TTTTCCAGAAGGAACTGGGAAAGCGAACAGGAACCGTTCAGGTCCTGAAGC 

GGTCAGGCCGAGAGCTGATTGAGAATAGTCGAGATGACACCACTTGGGTAA 

AAGGACAGCTCCAGGAACTGAGCACTCGCTGGGACACTGTCTGTAAACTCT 

CTGTTTCCAAACAAAGCCGGCTTGAGCAGGCCTTAAAACAAGCGGAAGTGTT 

TCGAGACACAGTCCACATGCTGTTGGAGTGGCTTTCTGAAGCAGAGCAAAC 

GCTTCGCTTTCGGGGAGCACTTCCTGATGACACAGAGGCCCTGCAGTCTCT 

CATTGACACCC 


GGAAAAAGAAGAGCTGCCACGTGCCGTGGGTACCCAGACATTGAGTGGTGC 

TGGTCTCCTCAAGATGTTCAACAAAGCCACAGATGCCGTCAGCAAAATGACC 

ATCAAGATGAATGAATCAGACATTTGGTTTGAGGAGAAGCTCCAGGAGGTAG 

AGTGTGAGGAGCAGCGCTTACGGAAACTGCATGCTGTTGTAGAAACTCTAGT 

CAACCATAGGAAAGAGCTAGCGCTGAACACAGCCCAGTTTGCAAAGAGTCTA 

GCCATGCTTGGGAGCTCTGAGGACAACACGGCATTGTCACGGGCACTCTCC 

CAGCTGGCTGAGGTGGAAGAAAAAATTGAGCAGCTCCACCAGGAACAGGCC 

AACAATGACTTCTTCCTCCTTGCTGAGCTCCTGAGTGACTACATTCGCCTCCT 

GGCCATAGTCCGCGCTGCCTTCGACCAGCGCATGAAGACATGGCAGCGCTG 

GCAGGATGCCCAAGCCACACTGCAGAAGAAGCGGGAGGCCGAGGCTCGGC 

TGCTGTGGGCCAACAAGCCTGATAAGCTGCAGCAGGCCAAGGACGAGATCC 

TCGAGTGGGAGTCTCGGGTGACTCAATATGAAAGGGACTTCGAGAGGATTT 

CAACAGTGGTCCGAAAAGAAGTGATACGGTTTGAGAAAGAGAAATCCAAGGA 
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CATTTGTAAAACTGAAGCAGCTATGCGTTAATGAGCCTTTTGAAGAAACTGAA 

GAGAAATGGTTATCTTCACTGGAAAATACTCGATGGTTAGAATATGTAAGGG 

CATTCCTTAAGCATTCAGCAGAACTTGTATACATGCTAGAAAGCAAACATCTC 

TCTGTAGTCCTACAAGAGGAGGAAGGAAGAGACTTGAGCTGTTGTGTAGCTT 

CTCTTGTTC A AGTG ATG CTG G ATCCCT ATTTT AG G AC A ATT ACTG G ATTTC AG 

AGTCTGATACAGAAGGAGTGGGTCATGGCAGGATATCAGTTTCTAGACAGAT 

GCAACCATCTAAAGAGATCAGAGAAAGAGTCTCCTTTATTTTTGCTATTCTTG 

GATGCCACCTGGCAGCTGTTAGAACAATATCCTGCAGCTTTTGAGTTCTCCG 

AAACCTACCTGGCAGTGTTGTATGACAGCACCCGGATCTCACTGTTTGGCAC 

CTTCCTGTTCAACTCCCCTCACCAGCGAGTGAAGCAAAGCACGGTCAGTAG 

nATAAAAAGTTGTACAAAACAAGATTATTTTCCTTCACGAGTTTGA 


GGAAGAAGAAGAGACAGAGCTGCCCACTGTGCCCCCAG 1 UUUUAUAuAA^o 
CAGTCCCATGCCAGACCCTTGCAGTAGTGAACTGGATGCCATGATGCTGGG 
GCCCCGTGGGAAGACCTATGCTTTCAAGGGGGACTATGTGTGGACTGTATC 
AGATTCAGGACCGGGCCCCTTGTTCCGAGTGTCTGCCCTTTGGGAGGGGCT 
CCCCGGAAACCTGGATGCTGCTGTCTACTCGCCTCGAACACAATGGATTCAC 
TTCTTTAAGGGAGACAAGGTGTGGCGCTACATTAATTTCAAGATGTCTCCTG 
GCTTCCCCAAGAAGCTGAATAGGGTAGAACCTAACCTGGATGCAGCTCTCTA 
TTGGCCTCTCAACCAAAAGGTGTTCCTCTTTAAGGGCTCCGGGTACTGGCAG 
TGGGACGAGCTAGCCCGAACTGACTTCAGCAGCTACCCCAAACCAATCAAG 
r.nTTTnTTTACGGGAGTGCCAAACCAGCCC 


GGCTCCCTTGACCTTCCAAGAGGTGCAGGCTGGTGCGGCTGACA 1 (JCJCiUU I 

CTCCTTCCATGGCCGCCAAAGCTCGTACTGTTCCAATACTTTTGATGGGCCT 

GGGAGAGTCCTGGCCCATGCCGACATCCCAGAGCTGGGCAGTGTGCACTTC 

GACGAAGACGAGTTCTGGACTGAGGGGACCTACCGTGGGGTGAACCTGCG 

CATCATTGCAGCCCATGAAGTGGGCCATGCTCTGGGGCTTGGGCACTCCCG 

ATATTCCCAGGCCCTCATGGCCCCAGTCTACGAGGGCTACCGGCCCCACTT 

TAAGCTGCACCCAGATGATGTGGCAGGGATCCAGGCTCTCTATGGCAAGAA 

GAGTCCAGTGATAAGGGATGAGGAAGAAGAAGAGACAGAGCTGCCCACTGT 

GCCCCCAGTGCCCACAGAACCCAGTCCCATGCCAGACCCTTGCAGTAGTGA 

ACTGGATGCCATGATGCTGGGTGAGGCCCCTCCCCTCCAGGCTGTTGGCAG 

GCGGTGGGGGCAGCCTGCTGATCCTGAGGCCTGGACAAATGGGAGTGACA 

TGGGACTTCAGCATGAGCAATGGAGGGCCCCGTGGGAAGACCTATGCTTTC 

AAGGGGGACTATGTGTGGACTGTATCAGATTCAGGACCGGGCCCCTTGTTC 

CGAGTGTCTGCCCTTTGGGAGGGGCTCCCCGGAAACCTGGATGCTGCTGTC 

TAr.Tnnr.GTCGAACACAATGGATTCACTTCTTTAA 


ATGGAGAAATATTCAATAATGAAGAGCATGAA 1 AIGUAI UUAAAAAAAuuAAA 
A AG G ACC ATTTT AG AA ATG AC AC A A AT ACTC AA AAG G C ATG G CT ATTG C ACCT 
TGGGAGAAGCCTTTAATCGGTTAGACTTCTCAAGTGCAATTCAAGATATCCG 
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RPIRNLTFQDLHLHHGGHQ 
AANTSHDLAQRHGLESASD 
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YCCLVVEIRHHHSEHRVHG 
AMELQVQTGKDAPSNCVV 
YPSSSQDSENITAAALATG 
ACIVGILCLPLILLLVYKQRQ 
AAS 
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DPPRLECVAFSHQNLKLKW 
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EAGEGPLSQEYIFTTPKSV 

PAALKAPKIEKVNDHICEIT 

WECLQPMKGDPVIYSLQV 

MLGKDSEFKQIYKGPDSSF 

RYSSLQLNCEYRFRVCAIR 


ARLKULbALLINbrvbAALb 1 A 

LSEKRTLEGELHDLRGQVA 
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RVDAENRLQTMKEELDFQ 
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TYSAKLDNARQSAERNSNL 
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EDSLARERDTSRRLLAEKE 

REMAEMRARMQQQLDEY 

QELLDIKLALDMEIHAYRKL 

LEGEEERLRLSPSPTSQRS 

RG RASSHSSQTQGGGSVT 

KKRKLESTESRSSFSQHAR 

TSGRVAVEEVDEEGKFVRL 

RNKSNEDQSMGNWQIKRQ 
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GATGTGACCTTCTACAAGACGTGGTACCGCAGCTCGAGGGGCGAGGTGCAG 
ACCTGCTCAGAGCGCCGGCCCATCCGCAACCTCACGTTCCAGGACCTTCAC 
CTGCACCATGGAGGCCACCAGGCTGCCAACACCAGCCACGACCTGGCTCAG 
CGCCACGGGCTGGAGTCGGCCTCCGACCACCATGGCAACTTCTCCATCACC 
ATGCGCAACCTGACCCTGCTGGATAGCGGCCTCTACTGCTGCCTGGTGGTG 
GAGATCAGGCACCACCACTCGGAGCACAGGGTCCATGGTGCCATGGAGCTG 
CAGGTGCAGACAGGCAAAGATGCACCATCCAACTGTGTGGTGTACCCATCC 
TCCTCCCAGGATAGTGAAAACATCACGGCTGCAGCCCTGGCTACGGGTGCC 
TGCATCGTAGGAATCCTCTGCCTCCCCCTCATCCTGCTCCTGGTCTACAAGC 
AAAnRnAGGCAGCCTCCAA 


CCTTGGAGCTGGTCCTTTCAGCCATATGATAAAATTAAAAAC 1 AAGCU 1 U 1 UU 
CTCCTGATCCACCTCGTCTGGAATGTGTTGCCTTTAGCCACCAGAACCTTAA 
GCTGAAATGGGGAGAAGGAACTCCAAAGACATTGTCAACCGATTCTATTCAG 
TACCACCTTCAGATGGAGGATAAGAATGGACGGTTTGTATCCCTATACAGAG 
GACCATGTCATACATACAAAGTACAAAGACTTAATGAGTCAACATCCTATAAA 
TTCTGTATTCAAGCTTGTAATGAAGCTGGGGAAGGTCCCCTCTCCCAAGAAT 
ATATTTTCACTACTCCAAAATCTGTCCCAGCTGCCTTGAAAGCCCCCAAAATA 
GAGAAAGTAAATGATCACATTTGTGAAATTACATGGGAGTGTTTACAGCCAAT 
GAAAGGTGATCCAGTTATTTACAGTCTTCAAGTTATGTTGGGAAAAGATTCAG 
AATTCAAACAGATTTACAAGGGTCCCGACTCTTCCTTCCGGTATTCCAGCCTT 
rAnr.TnAAnTRTGAATATCGCTTCCGTGTATGTGCCATTCGCC 


GGCTCGGCTGAAGGACCTGGAGGCTCTGCTGAACTCCAAGGAGGGUUUAU 

TGAGCACTGCTCTCAGTGAGAAGCGCACGCTGGAGGGCGAGCTGCATGATC 

TGCGGGGCCAGGTGGCCAAGCTTGAGGCAGCCCTAGGTGAGGCCAAGAAG 

CAACTTCAGGATGAGATGCTGCGGCGGGTGGATGCTGAGAACAGGCTGCAG 

ACCATGAAGGAGGAACTGGACTTCCAGAAGAACATCTACAGTGAGGAGCTG 

CGTGAGACCAAGCGCCGTCATGAGACCCGACTGGTGGAGATTGACAATGGG 

AAGCAGCGTGAGTTTGAGAGCCGGCTGGCGGATGCGCTGCAGGAACTGCG 

GGCCCAGCATGAGGACCAGGTGGAGCAGTATAAGAAGGAGCTGGAGAAGA 

CTTATTCTGCCAAGCTGGACAATGCCAGGCAGTCTGCTGAGAGGAACAGCA 

ACCTGGTGGGGGCTGCCCACGAGGAGCTGCAGCAGTCGCGCATCCGCATC 

GACAGCCTCTCTGCCCAGCTCAGCCAGCTCCAGAAGCAGCTGGCAGCCAAG 

GAGGCGAAGCTTCGAGACCTGGAGGACTCACTGGCCCGTGAGCGGGACAC 

CAGCCGGCGGCTGCTGGCGGAAAAGGAGCGGGAGATGGCCGAGATGCGG 

GCAAGGATGCAGCAGCAGCTGGACGAGTACCAGGAGCTTCTGGACATCAAG 

CTGGCCCTGGACATGGAGATCCACGCCTACCGCAAGCTCTTGGAGGGCGAG 

GAGGAGAGGCTACGCCTGTCCCCCAGCCCTACCTCGCAGCGCAGCCGTGG 

CCGTGCTTCCTCTCACTCATCCCAGACACAGGGTGGGGGCAGCGTCACCAA 

AAAGCGCAAACTGGAGTCCACTGAGAGCCGCAGCAGCTTCTCACAGCACGC 
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PPPGHSLKFSUVYUniUU 

RPSTNELPLFDFPVKEVFEL 

LGVENVFQLFTCALLEFQIL 

LYSQHYQRLMTVAETITAL 

MFPFQWQHVYVPILPASLL 

HFLDAPVPYLMGLHSNGLD 

DRSKLELPQEANLCFVDID 

NHFIELPEDLPQFPNKLEFV 

QEVSEILMAFGIPPEGNLHC 

SESASKLKRLRASELVSDK 

RNGNIAGSPLHSYELLKEN 

ETIARLQALVKRTGVSLEKL 

EVREDPSSNKDLKVQCDE 

EELRIYQLNIQIREVFANRFT 

QMFADYEVFVIQPSQDKES 

WFTNREQMQNFDKASFLS 

DQPEPYLPFLSRFLETQMF 

ASFIDNKIMCHDDDDKDPV 

LRVFDSRVDKIRLLNVRTPT 

LRTSMYQKCTTVDEAEKAI 

ELRLAKIDHTAIHPHLLDMKI 

GQGKYEPGFFPKLQSDVLS 

TGPASNKWTKRNAPAQWR 

RKDRQKQHTEHLRLDNDQ 

REKYIQEARTMGSTIRQ 


WDSTKISKAYYKAMVIb 1 VV 
1 CYWLRKRHLMHETDSRVP | 
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ACGCACTAGCGGGCGCGTGGCCGTGGAGGAGGTGGATGAGGAGGGCAAGT 

TTGTCCGGCTGCGCAACAAGTCCAATGAGGACCAGTCCATGGGCAATTGGC 

AGATCAAGCGCCAGAATGGAGATGATCCCTTGCTGACTTACCGGTTCCCACC 

AAAGTTCACCCTGAAGGCTGGGCAGGTGGTGACGATCTGGGCTGCAGGAGC 

TGGGGCCACCCACAGCCCCCCTACCGACCTGGTGTGGAAGGCACAGAACA 

CCTGGGGCTGCGGGAACAGCCTGCGTACGGCTCTCATCAACTCCACTGGGG 

AAGAAGTGGCCATGCGCAAGCTGGTGCGCTCAGTGACTGTGGTTGAGGACG 

ACGAGGATGAGGATGGAGATGACCTGCTCCATCACCACCATGTGAGTGGTA 


CCCACCTCCTGGCCGGTCCTTG AAGTTTTCTGGGGTCTATGGGCGAA 1 AA 1 U 

TGCCAGAGACCAAGTACCAATGAGCTTCCCCTATTTGACTTTCCTGTCAAAG 

AG GTTTTTG AACTG CTCGGGGTGGAG AATGTGTTTC AG CTTTTT ACTTGTG C 

CCTTCTGGAGTTTCAAATCCTGCTCTACTCACAGCATTACCAGAGACTGATGA 

CTGTGGCGGAGACGATTACAGCTCTCATGTTTCCTTTCCAGTGGCAGCATGT 

CTATGTCCCTATTCTCCCAGCTTCTCTCCTGCATTTCTTAGATGCTCCTGTTC 

CATACCTGATGGGTTTGCATTCCAATGGCCTGGATGACCGGTCAAAGCTGGA 

GCTGCCTCAAGAGGCTAACCTCTGCTTTGTGGACATTGACAACCACTTCATT 

GAGTTGCCAGAGGACTTGCCACAGTTCCCCAACAAATTGGAGTTTGTCCAGG 

AAGTCTCTGAGATTCTCATGGCATTTGGAATTCCCCCTGAAGGGAATCTTCAT 

TGCAGTGAGAGTGCCTCCAAGCTGAAGAGGCTGCGGGCCTCTGAGCTTGTC 

TCGGACAAGAGGAATGGGAACATTGCTGGCTCCCCTTTGCATTCCTACGAGC 

TTCTTAAGGAGAATGAAACTATTGCCCGGCTGCAAGCCTTGGTCAAGAGAAC 

TGGGGTGAGCCTGGAAAAGTTGGAAGTGCGTGAAGACCCCAGCAGCAATAA 

GGATCTCAAAGTTCAGTGTGATGAAGAAGAACTCAGGATTTACCAGCTAAAC 

ATTCAGATCCGGGAAGTTTTTGCAAATCGTTTCACTCAGATGTTTGCAGATTA 

TGAGGTGTTTGTCATCCAACCCAGCCAGGATAAGGAATCCTGGTTTACCAAC 

AGGGAGCAAATGCAAAACTTTGATAAAGCATCTTTTCTGTCAGATCAGCCTGA 

GCCCTACCTGCCCTTCCTCTCAAGATTCCTGGAGACCCAGATGTTTGCATCT 

TTCATTGACAACAAAATAATGTGTCATGATGATGATGATAAAGACCCTGTACT 

CCGGGTATTTGATTCCCGAGTTGACAAGATCAGGCTGTTGAATGTTCGGACA 

CCTACTCTCCGTACATCCATGTACCAGAAGTGTACCACTGTGGATGAAGCAG 

AGAAAGCAATTGAGCTGCGTCTGGCAAAAATTGACCATACTGCAATTCACCC 

ACATTTACTTGACATGAAGATTGGACAAGGGAAATATGAGCCGGGCTTCTTC 

CCTAAGCTGCAGTCTGATGTACTTTCCACTGGGCCAGCCAGCAACAAGTGGA 

CGAAAAGGAATGCCCCTGCCCAGTGGAGGCGGAAAGATCGGCAGAAGCAG 

CACACAGAACACCTGCGTTTAGATAATGACCAGAGGGAGAAGTACATCCAGG 

AA^rrflnnAr.TATRHRr.AnnACTATCCGCCAG 


TGGGATTCAACTAAAATTAGCAAAGCATACTACAAAGCAATGGTAATTAGCAC 
TTGGTGTTACTGGCTAAGAAAGAGGCACTTGATGCATGAAACAGACTCACGT 
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AFHNTEVEAMQAESCYQL 

ARSFHVQEDYDQAFQYYY 

QATQFASSSFVLPFFGLGQ 

MYIYRGDKENASQCFEKVL 

KAYPNNYETMKILGSLYAA 

SEDQEKRDIAKGHLKKVTE 

QYPDDVEAWIELAQILEQT 

DIQGALSAYGTATRILQEKV 

QADVPPEILNNVGALHFRL 

GNLGEAKKYFLASLDRAKA 

EAEHDEHYYNAISVTTSYN 

LARLYEAMCEFHEAEKLYK 

NILREHPNYVDCYLRLGAM 

ARDKGNFYEASDWFKEAL 

QINQDHPDAWSLIGNLHLA 

KQEWGPGQKKFERILKQP 

STQSDTYSMLALGNVWLQ 

TLHQPTRDREKEKRHQDR 

ALAI YKQ VLR N DAKN LYAA 

NGIGAVLAHKGYFREARDV 

FAQVREATADISDVWLNLA 

HIYVEQKQYISAVQMYENC 

LRKFYK 


LINYVGrlNYLrYUG 1 VAUU 
IVLRWKKPDIPRPIKINLLFPI 
lYLLFWAFLLVFSLWSEPVV 
CGIGLAIMLTGVPVYFLGVY 
WQHKPKCFSDFIELLTLVS 
QKMCVVVYPEVERGSGTE 
EANEDMEEQQQPMYQPTP 
TKDKDVAGQPQP* 
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GCTTGAGCTGTTGTGGAATAAATACTGGGTGAATACGTTGAGTTCTTCTAGCT 
TGGTTACTAATGC 


GGCAAATCACTTTTTCTTCAAAAAGGATTATAGTAAAGTCCAGCA 1 C 1 GGGUU 

TCCATGCATTCCATAATACAGAAGTGGAAGCTATGCAAGCAGAGAGCTGCTA 

TCAGCTAGCTAGATCATTCCATGTTCAGGAAGATTATGACCAAGCTTTTCAGT 

ACTATTATCAAGCCACACAGTTTGCCTCATCCTCTTTTGTGCTCCCATTTTTTG 

GTTTGGGACAAATGTATATTTATCGAGGTGACAAAGAAAATGCATCTCAGTGC 

TTTGAGAAGGTTTTGAAAGCTTATCCTAATAATTACGAAACTATGAAAATTCTC 

GGCTCTCTCTATGCTGCCTCAGAAGATCAAGAAAAACGAGATATTGCCAAGG 

GCCATTTGAAGAAGGTCACAGAACAGTATCCCGATGATGTTGAAGCTTGGAT 

TGAATTGGCACAAATCTTAGAACAGACTGATATACAGGGTGCCCTTTCAGCC 

TATGGAACAGCAACACGAATCCTTCAGGAGAAAGTGCAGGCCGATGTTCCTC 

CAGAGATTCTCAATAATGTGGGTGCCCTCCATTTTAGACTTGGAAACCTAGG 

GGAGGCTAAGAAATATTTTTTGGCGTCATTGGACCGTGCAAAAGCAGAAGCG 

GAACACGATGAGCATTACTATAACGCCATTTCCGTTACCACGTCATATAATCT 

CGCCAGGCTATATGAGGCGATGTGTGAATTCCATGAAGCAGAAAAACTGTAT 

AAAAACATCTTACGCGAACATCCTAATTATGTTGACTGCTATTTGCGCCTAGG 

AGCCATGGCTAGAGATAAGGGAAACTTTTATGAGGCTTCAGATTGGTTTAAG 

GAAGCTCTTCAGATTAATCAGGATCATCCAGATGCTTGGTCTTTGATTGGCAA 

TCTTCATTTGGCAAAACAAGAATGGGGTCCTGGGCAGAAGAAGTTTGAGAGG 

ATATTAAAACAGCCATCCACACAGAGTGATACCTATTCTATGCTAGCCCTTGG 

CAACGTGTGGCTCCAAACTTTACATCAGCCCACCCGAGATCGAGAAAAGGAA 

AAGCGTCATCAAGATCGTGCTCTGGCCATCTACAAACAAGTACTCAGAAATG 

ATG C A AAG A ATCTGT ATG CTG CC A ATG G CAT AG GAG CTGTTTTG G CCC AC A A 

AGGATATTTTCGTGAAGCTCGTGATGTATTTGCCCAAGTAAGAGAAGCAACA 

GCAGATATTAGTGATGTGTGGCTGAACTTAGCACACATCTATGTGGAGCAAA 

AGCAGTACATCAGCGCCGTTCAGATGTATGAAAACTGCCTCCGAAAGTTCTA 

TAAGCA 


CTC ATC A ACT ACGTG GG CTTC ATC AACT ACCTCTTCT ATG GGGGCACGG 1 1 G 
CTGGACAGATAGTCCTTCGCTGGAAGAAGCCTGATATCCCCCGCCCCATCAA 
GATCAACCTGCTGTTCCCCATCATCTACTTGCTGTTCTGGGCCTTCCTGCTG 
GTCTTCAGCCTGTGGTCAGAGCCGGTGGTGTGTGGCATTGGCCTGGCCATC 
ATGCTGACAGGAGTGCCTGTCTATTTCCTGGGTGTTTACTGGCAACACAAGC 
CCAAGTGTTTCAGTGACTTCATTGAGCTGCTAACCCTGGTGAGCCAGAAGAT 
GTGTGTGGTCGTGTACCCCGAGGTGGAGCGGGGCTCAGGGACAGAGGAGG 
CTAATGAGGACATGGAGGAGCAGCAGCAGCCCATGTACCAACCCACTCCCA 
rnAAGGAGAAGGACGTGGCGGGGCAGCCCCAGCCCTGA 


CTGGGATTACAGGCATGAGCCACAGCACCTGGCTGAGTTTTCTCAGCACCAT 
TTATTGAATAGACTGTCCTTTCCCTGGTGTATGTTATTGCATTTGTTGAAAATG 
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TCACCGCCAGCGCCTGCTGCAGGAAGCCGCTGCCAACTGGCACTCCTGGCT 
CATCAAGGTGCAGAAGATGAAAGCTGTCTACCACATCCTGAACATGTGCAAC 
ATCGACGTCACCCAGCAGTGTGTCATCGCCGAGATCTGGTTCCCGGTGGCA 
GATGCCACACGTATCAAGAGGGCACTGGAGCAAGGCATGGAACTAAGTGGC 
TCCTCCATGGCCCCCATCATGACCACAGTGCAATCTAAAACAGCCCCTCCCA 


ATGGGAGTGACATGGGACTTCAGCATGAGCAATGGAGGGUUUUU 1 UUUAA 

GACCTATGCTTTCAAGGGGGACTATGTGTGGACTGTATCAGATTCAGGACCG 

GGCCCCTTGTTCCGAGTGTCTGCCCTTTGGGAGGGGCTCCCCGGAAACCTG 

GATGCTGCTGTCTACTCGCCTCGAACACAATGGATTCACTTCTTTAAGGGAG 

ACAAGGTGTGGCGCTACATTAATTTCAAGATGTCTCCTGGCTTCCCCAAGAA 

GCTGAATAGGGTAGAACCTAACCTGGATGCAGCTCTCTATTGGCCTCTCAAC 

CAAAAGGTGTTCCTCTTTAAGGGCTCCGGGTACTGGCAGTGGGACGAGCTA 

GCCCGAACTGACTTCAGCAGCTACCCCAAACCAATCAAGGGTTTGTTTACGG 

GAGTGCCAAACCAGCCCTCGGCTGCTATGAGTTGGCAAGATGGCCGAGTCT 

ACTTCTTCAAGGGCAAAGTCTACTGGCGCCTCAACCAGCAGCTTCGAGTAGA 

GAAAGGCTATCCCAGAAATATTTCCCACAACTGGATGCACTGTCGTCCCCGG 

ACTATAGACACTACCCCATCAGGTGGGAATACCACTCCCTCAGGTACGGGCA 

TAACCTTGGATACCACTCTCTCAGCCACAGAAACCACGTTTGAATACTGA 


AGACCAGAGCCATGTTGTTCAAGAGCATTTAAGTGAAGAAAAGGA 1 GAAAUA 

CTACACTGTGAGAATAATGATAAAGCCCCTGAATCAGAGTCAGAGAAGCCAA 

CTCCTCTGTCCACTGGGCAAGGTAATAGAGCTGAAGAGGGACCAAACGCTA 

GTTCAGGTTTCATGAAGACTGCTGTACTAGGACCTACACTGAAAAATGTAATG 

ATGAAAAATAATAAACTAGCAGTTTCCCCTAACTATAATGCTACGTTTATGGG 

CTTCAAGATGATGGATGGAAAACAGCATATTGTATTAAAATTGGTGCCTATCA 

AACAAAATGTATGTTCACCAGGCTCACAGTCAGGTGCTGCAAAGGACGGTAC 

TG CT AATTTG C AG CCCC AG ACTTTG G AC ACT A ATG G ATTTTT A AC AG G AGT A A 

CAACTGAGTTAAATGACACAGTTTATATGAAAGCAGCTACTCCATTTTCATGT 

TCATCTTCTATACTTTCAGGGAAAGCAAGTTCAGAAAAAGAAATGACTTTGAT 

ATCTCAAAGGAATAATATGCTTCAAACAATGGATTATGAGAAAAGTGTATCTT 

CTTTGTCAGCAACATCAGAATTGGTTACAGCATCAGTGAATTTGACCACAAAA 

TTTGAAACAAGAGATAATGTTGACTTCTGGGGAAATCATCTCACTCAGAGTCA 

CCCCGAGGTATTAGGTACCACCATTAAAAGTCCAGATAAAGTCAACTGTGTT 

GCCAAACCAAATGCATACAACAGTGGAGATATGCATAATTATTGCATTAATTA 

TGGCAACTGTGAGTTACCTGTTGAATCCTCCAACCAAGGATCATTACCTTTTC 

ATAATTACTCAAAAGTGAATAATTCTAATAAACGTCGTAGGTTTTCAGGAACA 

GCAGTGTATGAAAACCCTCAAAGAGAATCTTCATCCAGCAAAACAGTTGTCC 

AACAACCAATTAGTGAATCATTTTTATCACTAGTGAGGCAGGAGAGCTCAAAA 

CCAGATAGCCTATTAGCATCTATTAGCCTTTTAAATGATAAAGATGGAACTTT 
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GATTGATGCCAAGTCCTTTGCCCGTATCAAGAAGTGGCTGGAGCACGCGCG 
CTCCTCGCCCAGCCTCACCATCCTGGCTGGGGGCAAGTGTGATGACTCCGT 
GGGCTACTTTGTGGAGCCCTGCATCGTGGAGAGCAAGGACCCTCAGGAGCC 
CATCATGAAGGAGGAGATCTTCGGGCCTGTACTGTCTGTGTACGTCTACCCG 
GACGACAAGTACAAGGAGACGCTGCAGCTGGTTGACAGCACCACCAGCTAT 
GGCCTCACGGGGGCAGTGTTCTCCCAGGATAAGGACGTCGTGCAGGAGGC 
CACAAAGGTGCTGAGGAATGCTGCCGGCAACTTCTACATCAACGACAAGTCC 
ACTGGCTCGATAGTGGGCCAGCAGCCCTTTGGGGGGGCCCGAGCCTCTGG 
AACCAATGACAAGCCAGGGGGCCCACACTACATCCTGCGCTGGACGTCGCC 
GCAGGTCATCAAGGAGACACATAAGCCCCTGGGGGACTGGAGCTACGCGTA 


AACAGAGCTGCCTCCTGGCTCTTTGGGAGCCTGGGAGGAGAAGGAGCUUU 

GAGGGGCGCTGCGGGGAAGCCACCTGCGGATTCACTGGCTGCTGCTCCGC 

CCAGGACTGCTAGCAAGCACGGAGGGCTGCCAGACCTGGGGCTCCCTGCT 

CCGTGCGTCAGGTTGGGGAAACCACCGTCTGCCCCAGACCCTGACCCAGGA 

ncnfinCTGGAGGAAGCTGGG 


ATGTCTCAGGCTGTGCAGACAAACGGAACTCAACCATTAAGCAAAACA 1 GGU 

AACTCAGTTTATATGAGTTACAACGAACACCTCAGGAGGCAATAACAGATGG 

CTTAGAAATTGTGGTTTCACCTCGAAGTCTACACAGTGAATTAATGTGCCCAA 

TTTGTTTGGATATGTTGAAGAACACCATGACTACAAAGGAGTGTTTACATCGT 

TTTTGTGCAGACTGCATCATCACAGCCCTTAGAAGTGGCAACAAAGAATGTC 

CTACCTGTCGGAAAAAACTAGTTTCCAAAAGATCACTAAGGCCAGACCCAAA 

CTTTGATGCACTCATCAGCAAAATTTATCCAAGTCGTGATGAGTATGAAGCTC 

ATC A AG AG AG AGT ATT AG CC AG G ATC AAC AAG C AC A AT A ATC AG C A AG C ACT 

CAGTCACAGCATTGAGGAAGGACTGAAGATACAGGCCATGAACAGACTGCA 

GCGAGGCAAGAAACAACAGATTGAAAATGGTAGTGGAGCAGAAGATAATGG 

TGACAGTTCACACTGCAGTAATGCATCCACACATAGCAATCAGGAAGCAGGC 

CCTAGTAACAAACGGACCAAAACATCTGATGATTCTGGGCTAGAGCTTGATA 

ATAACAATGCAGCAATGGCAATTGATCCAGTAATGGATGGTGCTAGTGAAAT 

TGAATTAGTATTCAGGCCTCATCCCACACTTATGGAAAAAGATGACAGTGCA 

CAGACGAGATACATAAAGACTTCTGGTAACGCCACTGTTGATCACTTATCCAA 

GTATCTGGCTGTGAGGTTAGCTTTAGAAGAACTTCGAAGCAAAGGTGAATCA 

AACCAGATGAACCTTGATACAGCCAGTGAGAAGCAGTATACCATTTATATAG 

CAACAGCCAGTGGCCAGTTCACTGTATTAAATGGCTCTTTTTCTTTGGAATTG 

GTCAGTGAGAAATACTGGAAAGTGAACAAACCCATGGAACTTTATTACGCAC 

r.TAnAAAnnAGCACAAATGA 


ATGTCCAAGCGGCACCGGTTGGACCTAGGGGAGGATTACCCCTCTGGCAAG 
AAGCGTGCGGGGACCGATGGGAAGGATCGAGATCGAGACCGGGATCGTGA 
AGATCGGTCTAAAGATCGAGACCGAGAACGTGATAGAGGAGATAGAGAGCG 
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DISVAVATPRGLVVPVIRNV 

EAMNFADIERTITELGEKAR 

KNELAIEDMDGGTFTISNG 

GVFGSLFGTPIINPPQSAIL 

GMHGIFDRPVAIGGKVEVR 

PMMYVALTYDHRLIDGREA 

VTFLRKIKAAVEDPRVLLLD 

L* 


AASRRLMKELbblHKuulVIK. 
NFRNIQVDEANLLTWQGLI 
VPDNPPYDKGAFRIEINFPA 
EYPFKPPKITFKTKIYHPNID 
EKGQVCLPVISAENWKPAT 
KTDQVIQSLIALVNDPQPEH 
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NAEEFTKKYGEKRPVD* 
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ACGGATTAATAAGGAACTTAGTGATTTGGCCCGTGACCCTCCAGCACAATGT 

TCTGCAGGTCCAGTTGGGGATGATATGTTTCATTGGCAAGCCACAATTATGG 

GACCTAATGACAGCCCATATCAAGGCGGTGTATTCTTTTTGACAATTCATTTT 

CCTACAGACTACCCCTTCAAACCACCTAAGGTTGCATTTACAACAAGAATTTA 

TCATCCAAATATTAACAGTAATGGCAGCATTTGTCTCGATATTCTAAGATCAC 

AGTGGTCGCCTGCTTTAACAATTTCTAAAGTTCTTTTATCCATTTGTTCACTGC 

TATGTGATCCAAACCCAGATGACCCCCTAGTGCCAGAGATTGCACGGATCTA 

TAAAACAGACAGAGATAAGTACAACAGAATATCTCGGGAATGGACTCAGAAG 

TATGCCATGTGA 


AACTGGTGCTGCTCCTGCTAAGGCCAAGCCGGCTGAAGCTCCTGC1 GC I GU 

AGCCCCAAAAGCAGAACCTACAGCAGCGGCAGTTCCTCCCCCTGCAGCACC 

CATACCCACTCAGATGCCACCGGTGCCCTCGCCCTCACAGCCTCCTTCTGG 

CAAACCTGTGTCTGCAGTAAAACCCACTGTTGCCCCACCACTAGCTGAGCCA 

GGAGCTGGCAAAGGTCTGCGTTCAGAACATCGGGAGAAAATGAACAGGATG 

CGGCAGCGCATTGCTCAGCGTCTGAAGGAGGCCCAGAATACATGTGCAATG 

CTGACAACTTTTAATGAGATTGACATGAGTAACATCCAGGAGATGAGGGCTC 

GGCACAAAGAGGCTTTTTTGAAGAAACATAACCTCAAACTAGGCTTCATGTC 

G G C ATTTGTG AAG G CCTC AG CCTTTG CCTTG C AG G AAC AG CCTGTTGT A A AT 

GCAGTGATTGACGACACAACCAAAGAGGTGGTGTATAGGGATTATATTGACA 

TCAGTGTTGCAGTGGCCACCCCACGGGGTCTGGTGGTTCCAGTCATCAGGA 

ATGTGGAAGCTATGAATTTTGCAGATATTGAACGGACCATCACTGAACTGGG 

AGAGAAGGCCCGAAAGAATGAACTTGCCATTGAAGATATGGATGGCGGTAC 

CTTCACCATTAGCAATGGAGGCGTTTTTGGCTCGCTCTTTGGAACACCCATT 

ATCAACCCCCCTCAGTCTGCCATCCTGGGGATGCATGGCATCTTTGACAGGC 

CAGTGGCTATAGGAGGCAAGGTAGAGGTGCGGCCCATGATGTACGTGGCAC 

TGACCTATGATCACCGGCTGATTGATGGCAGAGAGGCTGTGACTTTCCTCCG 

r,AAAATCAAGGCAGCGGTAGAGGATCCCAGAGTCCTCCTCCTGGATCTTTAG 


GGCGGCCAGCAGGAGGCTGATGAAGGAGCTTGAAGAAATCCGCAAA I U I UU 
GATGAAAAACTTCCGTAACATCCAGGTTGATGAAGCTAATTTATTGACTTGGC 
AAGGGCTTATTGTTCCTGACAACCCTCCATATGATAAGGGAGCCTTCAGAAT 
CGAAATCAACTTTCCAGCAGAGTACCCATTCAAACCACCGAAGATCACATTTA 
AAACAAAGATCTATCACCCAAACATCGACGAAAAGGGGCAGGTCTGTCTGCC 
AGTAATTAGTGCCGAAAACTGGAAGCCAGCAACCAAAACCGACCAAGTAATC 
CAGTCCCTCATAGCACTGGTGAATGACCCCCAGCCTGAGCACCCGCTTCGG 
GCTGACCTAGCTGAAGAATACTCTAAGGACCGTAAAAAATTCTGTAAGAATG 
CTGAAGAGTTTACAAAGAAATATGGGGAAAAGCGACCTGTGGACTAA 


ATGATGGCGAGCATGCGAGTGGTGAAGGAGCTGGAGGATCTTCAGAAGAAG 
CCTCCCCCATACCTGCGGAACCTGTCCAGCGATGATGCCAATGTCCTGGTG 
TGGCACGCTCTCCTCCTACCCGACCAACCTCCCTACCACCTGAAAGCCTTCA 
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ACCTGCGCATCAGCTTCCCGCCGGAGTATCCGTTCAAGCCTCCCATGATCAA 

ATTCACAACCAAGATCTACCACCCCAACGTGGACGAGAACGGACAGATTTGC 

CTGCCCATCATCAGCAGTGAGAACTGGAAGCCTTGCACCAAGACTTGCCAA 

GTCCTGGAGGCCCTCAATGTGCTGGTGAATAGACCGAATATCAGGGAGCCC 

CTGCGGATGGACCTCGCTGACCTGCTGACACAGAATCCGGAGCTGTTCAGA 

AARAATfinr.RAARAGTTCACCCTCCGATTCGGAGTGGACCGGCCCTCCTAA 


ATGTCAGTTGGGCACAAGGCCCAGG AGAGCAAGATTCGA 1 ACAAAAUUAA I 

GAACCTGTGTGGGAGGAAAACTTCACTTTCTTCATTCACAATCCCAAGCGCC 

AGGACCTTGAAGTTGAGGTCAGAGACGAGCAGCACCAGTGTTCCCTGGGGA 

ACCTGAAGGTCCCCCTCAGCCAGCTGCTCACCAGTGAGGACATGACTGTGA 

GCCAGCGCTTCCAGCTCAGTAACTCGGGTCCAAACAGCACCATCAAGATGA 

ArtATTnnnrrrfiOGGGTGCTCCATCTCGAAAAGCGAGAAAGGCCTCCAGACC 


CTGGGATGCCCTCAAGGCTGCCGCCTATGCTGCTGAAGCCAACGAUUAUUA 
GCTGGCCCAGGCCATCCTGGATGGAGCCAGCATCACCCTGCCTCATGGCAC 
CCTCTGTGAATGCTACGATGAGCTGGGCAATCGCTACCAGCTGCCCATCTAC 
TGCCTGTCACCGCCGGTGAACCTGCTGCTGGAGCACACGGAGGAGGAGAG 
CCTGGAGCCCCCCGAGCCTCCACCCAGCGTGCGCCGTGAGTTCCCGCTGA 
AGGTGCGCCTGTCCACGGGCAAGGACGTGAGGCTCAGCGCCAGCCTGCCC 
GACACAGTGGGGCAGCTCAAGAGGCAGCTGCACGCCCAGGAGGGCATCGA 
GCCATCGTGGCAGCGGTGGTTCTTCTCCGGGAAGCTGCTCACAGACCGCAC 
ACGGCTCCAGGAGACCAAGATCCAGAAAGATTTTGTCATCCAGGTCATCATC 

1 AAP. 


CGTCTGTGCCGTCTGCCGCAAGAAGTTCGTCAGCTCCA 1 GAGGC 1 GCGUAU 
CCACATCAAAGAGGTGCACGGGGCTGCCCAGGAGGCCTTGGTCTTCACCAG 
TTCCATCAACCAGAGCTTCTGCCTCCTGGAACCTGGTGGGGACATCCAGCAA 
GAAGCTCTGGGGGACCAGCTACAGCTGGTGGAAGAGGAGTTTGCCCTCCAG 


GAGAATCCACAAGGAATTGAATGATCTGGCACGGGACCCTCCAGCACJAU I b 

TTCAGCAGGTCCTGTTGGAGATGATATGTTCCATTGGCAAGCTACAATAATG 

GGGCCAAATGACAGTCCCTATCAGGGTGGAGTATTTTTCTTGACAATTCATTT 

CCCAACAGATTACCCCTTCAAACCACCTAAGGTTGCATTTACAACAAGAATTT 

ATCATCCAAATATTAACAGTAATGGCAGCATTTGTCTTGATATTCTACGATCA 

CAGTGGTCTCCAGCACTAACTATTTCAAAAGTACTCTTGTCCATCTGTTCTCT 

GTTGTGTGATCCCAATCCAGATGATCCTTTAGTGCCTGAGATTGCTCGGATC 

TACAAAACAGATAGAGAAAAGTACAACAGAATAGCTCGGGAATGGACTCAGA 

A^TATnr.nATfiTAA 


ATGGGAATTGGTCTTTCTGCTCAAGGTGTGAACATGAATAGAC 1 AUUAUU l l 

GGGATAAGCATTCATATGGTTACCATGGGGATGATGGACATTCGTTTTGTTCT 

TCTGGAACTGGACAACCTTATGGACCAACTTTCACTACTGGTGATGTCATTG 
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